<<

Do ecological communities co-diversify? An investigation into

the Sarracenia alata pitcher plant system

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the Graduate School of The Ohio State University

By

Jordan David Satler

Graduate Program in Evolution, Ecology, and Organismal Biology

The Ohio State University

2016

Dissertation Committee:

Dr. Bryan Carstens, Advisor

Dr. Laura Kubatko

Dr. John Freudenstein

Copyright by

Jordan David Satler

2016

Abstract

Interactions among species are driving forces behind the formation, structure, and persistence of ecological communities. The nature of species interactions that characterize communities, however, has long been debated by ecologists, varying from communities as fluid entities to communities as evolving units. For species with obligate interactions (e.g., host and parasite, plant and pollinator), we might expect these ecologically dependent associations to be reflected in a shared evolutionary history, yet relatively few studies have demonstrated this process in nature. To address this central tenet in ecology and evolutionary biology, my research explores co-diversification in the

Sarracenia alata pitcher plant system. Sarracenia alata (family Sarraceniaceae) is a carnivorous pitcher plant distributed along the Gulf Coast of the American southeast, bisected by the Mississippi River. Leaves of this plant are tube-shaped and filled with fluid, adapted for the capture and digestion of prey items. The breakdown of prey provides inorganic compounds to the plant, necessary in the nutrient-poor habitats where these plants are found. In addition to prey capture, the plant’s modified leaves harbor a unique biota of associated organisms (i.e., inquilines)—diverse species that share ecological relationships and often provide important services (e.g., secrete digestive enzymes) for the plant. My dissertation tests coevolution theory, exploring how a host

ii plant may influence the population genetic structure of associated species. Shared structure would suggest stable ecological relationships through evolutionary time, and would provide evidence that ecologically interacting lineages can evolve as a unit.

I first analyzed DNA sampled directly from the pitcher fluid to identify microorganisms contained within the community (Chapter 2). My results recover a diverse set of taxa found within the pitcher fluid, and demonstrate that roughly half of the small eukaryotes (e.g., fungi, ) share population genetic structure with S. alata, suggesting a shared evolutionary history among these interacting species. I further explored these questions of co-diversification with six species ( and ) that interact with the host plant; these species range on the ecological spectrum from obligate symbionts to opportunistic predators (Chapter 3). I first developed a novel method for quantifying phylogeographic congruence (Phylogeographic

Concordance Factors; PCFs). Results show ecological association correlates with a shared evolutionary history, and suggests that multiple co-diversified with the host pitcher plant. I then tested for simultaneous divergence across the Mississippi River, an important biogeographic barrier in this region, by estimating population splitting times in five arthropods spanning this barrier (Chapter 4). Results suggest that two of the arthropods dispersed across the Mississippi River (from east to west) synchronously with the plant. A third arthropod displays similar population genetic patterns as the plant, but is estimated to have dispersed across the Mississippi River at a later time. My dissertation provides evidence for the evolutionary stability of interacting lineages, suggesting that ecological communities co-diversify.

iii

This dissertation is dedicated to my parents, June and Larry Satler, for their love, support,

and encouragement as I pursued my interests.

iv

Acknowledgments

First and foremost, I would like to thank my advisor, Dr. Bryan Carstens, for being an outstanding mentor as I pursued my PhD. Bryan has been a wonderful advisor, and has helped me grow as a biologist and as a person. I have appreciated Bryan’s time and efforts with me, and his willingness to listen to my ideas, provide constructive criticism, and to help clarify concepts when they didn’t quite make sense. Bryan has also been incredibly patient and understanding with me, and I am tremendously thankful for that. I feel fortunate to have had Bryan as a PhD advisor. I would also like to thank my other committee members, Dr. Laura Kubatko and Dr. John Freudenstein, for their helpful comments, suggestions, and discussion over the years, and for providing their time towards this process. In particular, I would like to thank Dr. Laura Kubatko for her fantastic seminars each spring, covering topics I am eager to learn and discuss and in explaining difficult concepts in a way to make them understandable. These seminars have been instrumental in my growth as a . In addition, I would like to thank Dr. Norm

Johnson for being a valuable member of my candidacy committee. I appreciated Norm’s questions and comments during this time, and in our discussions over the years.

I would like to thank my funding sources for making much of my project possible. Specifically, I would like to thank the National Foundation (Doctoral

v

Dissertation Improvement Grant DEB-1501474), the Ohio State University (Presidential

Fellowship), the Society for the Study of Evolution (Rosemary Grant Award for Graduate

Student Research), and the Society of Systematic Biologists (Graduate Student Research

Award). I would also like to thank Dr. Marshal Hedin, Dr. Robb Brumfield, and Dr.

Frank Burbrink for writing letters of recommendation for me for the Presidential

Fellowship. The bulk of my analyses were carried out on the Oakley cluster at the Ohio

Supercomputer Center and the Ohio Biodiversity Conservation Partnership (OBCP)

Informatics Infrastructure housed at the Museum of Biological Diversity. These two computing resources were fantastic. In addition, I would like to thank Joe Cora, who was an excellent resource when I had questions or needed assistance with the OBCP cluster.

I have been fortunate to have had many wonderful lab mates during my PhD. I would like to thank current students Ariadna Morales-Garcia, Megan Smith, Sergei

Soloneko, and Greg Wheeler, for excellent discussion and interactions over the years. I have thoroughly enjoyed our lab meetings and discussions of papers and ideas. I’m excited to see where their paths go during their academic careers! It’s crucial for me to thank the Carsten lab members present when I first entered the lab, Sarah Hird, John

McVay, Tara Pelletier, and Noah Reid. My first year in the lab was incredible, and I have many wonderful memories of our time together at Louisiana State University. A big reason for me initially joining Bryan’s lab was because of these four people, and I know I made the right decision and greatly value the time we had together. In addition, Danielle

Fuselier was a superstar undergraduate in the lab at LSU, and helped with both field work and lab work during my first year. I would like to thank Michael Gruenstaeudl for great

vi discussion and help with computer coding while he was a postdoc in the lab at Ohio

State. I would also like to thank Matt Demarest and Maxim Kim for help and assistance regarding computer coding for data analysis while they worked in the lab my first couple of years at Ohio State.

So many people have helped contribute to where I am today in my academic career; there are too many to thank but I will give it my best shot! First, I would like to thank my Master’s advisor, Dr. Marshal Hedin. Marshal was instrumental in getting me started with research, and in learning how to be a scientist. I wouldn’t be where I am today without Marshal’s guidance and tutelage, and am forever thankful that as an undergraduate at San Diego State University, I enrolled in a course called Terrestrial

Arthropod Biology. That class changed my life. I would like to thank the many faculty and graduate students at LSU that I interacted with during the first year of my PhD. I had many wonderful discussions pertaining to science and life during this time, and look back fondly on those conversations and experiences. I would also like to thank the many faculty and graduate students at Ohio State. These past four years have been very important in my life, and the interactions, discussions, and experiences with these wonderful people has made it a time I will never forget. I am hesitant to list names for concern I may leave someone out, but I have forged many strong friendships over these past few years, and those people know who they are. Academia is a gift and a curse in that we get the opportunity to meet such wonderful people, but many of our stays are fleeting in time. I’m thankful for all the wonderful relationships I’ve had over the years at

vii

Ohio State, and look forward to continuing our friendships as we move forward in our careers.

I would like to thank Greg Dahlem and Rob Naczi for help with the identification of flesh in my study. I would like to thank Maggie Koopman and Amanda Zellmer for data collection and analysis pertaining to Chapter 2, and for their roles in the earlier work on Sarracenia alata building a foundation for the system with which I could explore with the arthropods. I would like to thank Kelly Zamudio, the Zamudio lab, and

Steve Bogdanowicz for assistance with RAD sequencing. I would also like to thank Mike

Sovic for discussion regarding RAD sequencing and protocols, and various aspects of the analysis of RAD data.

Finally, I would like to thank my family, friends, and those closest to me for their love, support, and encouragement over the years as I pursued my PhD. Lisa Miller has been a wonderful girlfriend and companion the past couple years, and has provided much needed relief to get my mind off work and explore nature and the many wonderful things life has to offer. My older brother Josh, sister-in-law Aurora, and nephew Jack have provided love and support over the years, and I am truly thankful to have them in my life.

Lastly, I would like to thank my parents, June and Larry Satler. I am here because of their never-ending love, support, and encouragement. These past five years have been challenging, and they were always there to offer support and encourage me to push through the hard times, because hard work and persistence can often be the cure for life’s challenges. I’m fortunate to come from such a wonderful family, and can’t thank them enough for all they have done for me.

viii

Vita

2002 ...... Madison High School

2007 ...... B.S. Biology, San Diego State University

2011 ...... M.Sc. Biology, San Diego State University

2012 ...... Ph.D. Student, Louisiana State University

2012 to present ...... Graduate Teaching Associate, Graduate

Research Associate, Presidential Fellow,

Department of Evolution, Ecology, and

Organismal Biology, The Ohio State

University

Publications

Satler JD, Carstens BC (2016) Phylogeographic concordance factors quantify

phylogeographic congruence among co-distributed species in the Sarracenia alata

pitcher plant system. Evolution, 70, 1105–1119.

ix

Satler JD, Zellmer AJ, Carstens BC (2016) Biogeographic barriers drive co-

diversification within associated eukaryotes of the Sarracenia alata pitcher plant

system. PeerJ, 4, e1576.

Garrick RC, Bonatelli IAS, Hyseni C, Morales A, Pelletier TA, Perez MF, Rice E, Satler

JD, Symula RE, Thome MTC, Carstens BC (2015) The evolution of

phylogeographic datasets. Molecular Ecology, 24, 1164–1171.

Reid NM, Hird SM, Brown JM, Pelletier TA, McVay JD, Satler JD, Carstens BC (2014)

Poor fit to the multispecies coalescent is widely detectable in empirical data.

Systematic Biology, 63, 322–323.

Carstens BC, Pelletier TA, Reid NM, Satler JD (2013) How to fail at species

delimitation. Molecular Ecology, 22, 4369–4383.

Carstens BC, Satler JD (2013) The carnivorous plant described as Sarracenia alata

contains two cryptic species. Biological Journal of the Linnean Society, 109, 737–

746.

Satler JD, Carstens BC, Hedin MC (2013) Multilocus species delimitation in a complex

of morphologically conserved trapdoor (, Antrodiaetidae,

Aliatypus). Systematic Biology, 62, 805–823.

x

Carstens BC, Brennan RS, Chua V, Duffie CV, Harvey MG, Koch RA, McMahan CD,

Nelson BJ, Newman CE, Satler JD, Seeholzer G, Posbic K, Tank DC, Sullivan J

(2013) Model selection as a tool for phylogeographic inference: an example from

the willow Salix melanoplis. Molecular Ecology, 22, 4014–4028.

Satler JD, Starrett J, Hayashi CY, Hedin M (2011) Inferring species trees from gene

trees in a radiation of California trapdoor spiders (Araneae, Antrodiaetidae,

Aliatypus). PLoS ONE, 6, e25355.

Fields of Study

Major Field: Evolution, Ecology, and Organismal Biology

xi

Table of Contents

Abstract ...... ii

Acknowledgments ...... v

Vita ...... ix

Publications ...... ix

Fields of Study ...... xi

Table of Contents ...... xii

List of Tables ...... xiv

List of Figures ...... xv

Chapter 1: Introduction ...... 1

Chapter 2: Biogeographic barriers drive co-diversification within associated eukaryotes of the Sarracenia alata pitcher plant system ...... 10

Chapter 3: Phylogeographic concordance factors quantify phylogeographic congruence among co-distributed species in the Sarracenia alata pitcher plant system ...... 39

Chapter 4: Do ecological communities disperse across biogeographic barriers as a unit?

...... 75

Chapter 5: Conclusion ...... 110

xii

References ...... 115

Appendix A: Chapter 2 Supplemental Material ...... 142

Appendix B: Chapter 3 Supplemental Material ...... 160

Appendix C: Chapter 4 Supplemental Material ...... 188

xiii

List of Tables

Table 1. Taxa included in the final comparative data set...... 26

Table 2. A chi-squared goodness of fit test ...... 32

Table 3. Sampling and sequencing information ...... 56

Table 4. AMOVA and IBD results...... 57

Table 5. Phylogeographic concordance factors for all permutations ...... 61

Table 6. Results from PyMsBayes analyses...... 64

Table 7. Genomic sequencing data ...... 89

Table 8. AMOVA results ...... 92

Table 9. Model selection using AIC and information theory ...... 93

Table 10. Estimates of population genetic parameters from FSC2 ...... 99

xiv

List of Figures

Figure 1. Sarracenia alata and arthropod community ...... 3

Figure 2. Sampling distribution of Sarracenia alata in Louisiana...... 16

Figure 3. Taxonomic composition of the OTUs for each sample site...... 24

Figure 4. Rarefaction curves of OTU richness at each sampling site...... 25

Figure 5. Population genetic structure for the inquiline community ...... 30

Figure 6. Distribution of Sarracenia alata in the southeastern United States ...... 45

Figure 7. Phylogeographic concordance factors between each arthropod species and the host pitcher plant...... 60

Figure 8. Community pattern of diversification ...... 63

Figure 9. Phylogeographic concordance factors estimated in two additional data sets .... 65

Figure 10. Models used in FSC2 analyses ...... 86

Figure 11. STRUCTURE results ...... 91

Figure 12. Divergence time estimates from FSC2 ...... 95

Figure 13. Effective population size estimates ...... 97

Figure 14. Effect of generation length on divergence time estimates ...... 107

xv

Chapter 1: Introduction

Ever since Darwin, biologists have been interested in understanding the evolutionary processes that have produced the tremendous biological diversity on the planet. Both climatic and landscape changes (e.g., formation of rivers, uprising of mountains) as well as ecological interactions among species have contributed to the formation of biodiversity (e.g., Ehrlich and Raven 1964; Richardson et al. 2001; Vieites et al. 2007; McKenna et al. 2009; Edger et al. 2015), and identifying these contributions has been a major focus of research in ecology and evolutionary biology. This is particularly relevant in temperate regions, where environmental changes during the

Pleistocene ice ages have been implicated as key drivers of the formation of new species across many groups (e.g., Hewitt 2000, 2004).

Investigations of ecological communities (i.e., groups of interacting species) can provide critical data for understanding how and when such communities came into existence, as well as provide evidence for their stability through time (Smith et al. 2011).

The extent to which the constituent species within an ecological community are expected to coevolve (i.e., share a history of co-diversification and association), however, is unclear. My dissertation research explores a key question of evolutionary biology—do

1 ecological communities co-diversify?—in a study system consisting of the carnivorous pitcher plant Sarracenia alata and its inquiline community.

Carnivorous plants have long fascinated biologists (e.g., Darwin 1875). Carnivory appears to have evolved independently at least six times in the Angiosperms, a remarkable transition in their evolution of resource acquisition (Ellison and Gotelli 2009).

One of the largest radiations of carnivorous plants, Sarracenia contain tubular-shaped leaves that fill with fluid and serve as an effective pitfall trap for prey. These pitchers also play host to a dynamic community of arthropods and microorganisms in their modified leaves (Fig. 1); the community lives in the oxygen-rich upper layer of the fluid column, is not digested by the fluid, and appears to provide important ecological services to the plants (Adlassnig et al. 2011).

2

Figure 1. Sarracenia alata and arthropod community. Shown is a selection of the diverse arthropods found in this system, described from bottom left and moving clockwise. Bottom left shows three adult pitcher plant (Exyra semicrocea) inside a pitcher. Next picture is of the pitcher plant (E. semicrocea) as larvae, sealing off the top of the pitcher with silk. Top two pictures are of spiders; left is a crab (Thomisidae), right is a green lynx spider (Peucetia viridans). Two pictures on right are of pitcher plant flesh flies (Sarcophagidae). The moth and flies are ecologically dependent on the pitcher plant for their life cycle; the spiders are opportunistic predators commonly encountered in this habitat but not restricted to the pitcher plant system.

3

Since the introduction of the discipline of phylogeography (Avise et al. 1987), the

American southeast has received tremendous attention and has a rich history of phylogeographic research (reviewed in Soltis et al. 2006). This is a landscape dominated by major rivers, with the most conspicuous being the Mississippi River, the largest river system in North America (Coleman 1988). This ancient river dates to the Mesozoic era, with a stream present since the Jurassic (Mann and Thomas 1968). During the Pleistocene glacial cycles, glaciers never reached to these southern latitudes, but sea level fluctuations combined with shifts in direction and flow of rivers influenced biological and genetic structure in the region. One species largely structured by these major rivers is the pale pitcher plant, Sarracenia alata. This plant is found along the gulf coast of the American southeast, with a distribution bisected by the Mississippi River. Zellmer et al. (2012) revealed extensive population genetic structure within S. alata, and suggest that major rivers in the region have promoted the isolation of populations within the plant. I collected additional molecular data and demonstrated that S. alata is composed of two cryptic species corresponding to either side of the Mississippi River, and suggested that these populations have been isolated from each other for hundreds of thousands of generations (Carstens and Satler 2013). This work highlighted the isolating abilities of the Mississippi River, a pattern seen in numerous taxa (e.g., Burbrink et al. 2000; Brant and Ortí 2003).

A diverse assemblage of microorganisms inhabits the pitcher fluid inside the modified leaves of the plant (Adlassnig et al. 2011). If population structure in these associated species is similar to S. alata, this would suggest interactions that have been

4 stable through time, and demonstrate that species showed a shared response to landscape and climatic changes. As relationships between the various species and the pitcher plant can be either facultative to obligate, and range in their requirements for the pitcher as habitat, I hypothesize that species which rely upon this specialized habitat would have population genetic structure that matches that of the host plant (i.e., reflecting a pattern of co-diversification), whereas species that are generalists and loosely associated with the plant would display idiosyncratic patterns of population structure.

Essentially all studies of taxonomic identity and ecological interactions among

Sarracenia inquilines have been centered around the purple pitcher plant, Sarracenia purpurea (e.g., Harvey and Miller 1996; Miller and Kneitel 2005). Outside of recent work on the microbial communities of S. alata (Koopman et al. 2010; Koopman and

Carstens 2011), little is known about the micro-eukaryotes inhabiting this unique system.

For Chapter 2, I analyzed over 9,000 DNA sequences recovered from the pitcher fluid across five populations, spanning the Mississippi River. I used BLAST searches to identify taxonomic diversity among these sequences, and clustering approaches to identify operational taxonomic units (OTUs) for population genetic analyses. For the taxa spanning the Mississippi River, roughly half of the species contained population genetic structure reflecting that of the pitcher plant; the most parsimonious explanation is that these species have co-diversified with the plant. To my knowledge, this is the first study that investigates the evolutionary history of the micro-eukaryotes found within a pitcher plant system, and demonstrates that ecology plays a role in shaping community structure and relationships through time (Satler et al. 2016).

5

Analyzing phylogeographic data sets in a community approach is challenging, as tools for these analyses lag far behind the development of tools for single-taxon questions. Example approaches include those that utilize species distribution modeling in a community, such as Hugall et al. (2002), who used paleoclimatological modeling of a snail species in the Australian Wet Tropics to determine suitable habitat during the

Pleistocene glacial cycles and identify areas of glacial refugia within the region. Other tools are explicitly based on data from the species range. For example, in a study of co- distributed species in California, Lapointe and Rissler (2005) recoded occurrence data from nine species in order to assess congruence across species using maximum agreement subtrees. They found support for congruence in diversification patterns when the subtrees were built into a supertree. Other approaches evaluate a set of models common to the comparative system, using tools such as Approximate Bayesian

Computation in a hierarchical framework to test for simultaneous divergence (Hickerson et al. 2007). Despite these approaches, the majority of comparative phylogeographic studies test for concordance on a species-by-species basis (e.g., Avise 2000). Such an approach, however, is limited. Estimates of parameters such as divergence time require knowledge of the neutral substitution rate and generation length in order to be converted to years, required for comparison across species. Systems that contain non-model organisms with broad taxonomic diversity are particularly challenging because mutation rates and generation lengths are likely not well known and differ among the taxa. More common, however, is the comparison among genealogies for inferring congruent evolutionary histories (Arbogast and Kenagy 2001). This process has been entirely

6 qualitative to date; unless two (or more) trees are identical, the degree to which they are different and the evolutionary processes driving those differences are left to interpretation by the researcher. In addition, two species may share similar evolutionary histories but exhibit striking differences in their gene tree topologies due to deep coalescence or differences in life history traits (Knowles and Maddison 2002).

Existing approaches for analyzing community data in a comparative framework are inadequate, forcing researchers to make inferences based on qualitative assessments of patterns in the data. Because I was not satisfied by this approach, I developed new theory and proposed a novel method for quantifying the evolutionary history of a community of species (Chapter 3). This new method makes three important contributions to the field: (i) it quantitatively tests the prediction that the level of ecological association correlates with shared evolutionary history, (ii) it provides an estimate of the evolutionary history of an entire community, and (iii) it identifies species that have not co-diversified with the community. As proof of concept, I tested my Phylogeographic Concordance

Factors (PCF) method using Sanger sequencing data from multiple arthropod species found within the S. alata pitcher plant system. These included a , moth, spider, and , with sampled species ranging from specialists to generalists with respect to the pitcher plant. Results of the PCF analysis demonstrate that ecology correlates with evolutionary history, suggesting that many of these arthropods have co-diversified with the plant. The opportunistic predator (spider) shows high levels of discordance with the host plant—a result predicted based on the loose association between the two species— and suggests a transient relationship between the spider and the plant. This novel method

7 highlights the Mississippi River as a driver of biological diversification, and the spider as an outlier species to the community (Satler and Carstens 2016).

My dissertation work suggests that the Mississippi River has been an important factor in the evolution of the S. alata pitcher plant community (Chapters 2 and 3). To better understand if this is a shared response to the river, it is necessary to estimate the population splitting times among the co-distributed species. In Chapter 4, I explored the timing of divergence in the S. alata arthropod community. In this system, the Mississippi

River split S. alata into two populations, and inferences made from genetic data suggests this split took place >120,000 years before present (Zellmer et al. 2012). I hypothesize that the species tightly linked to S. alata (e.g., flesh flies, moth) will show similar times of population divergence, whereas those species loosely associated with the system (e.g., the spider) will have random divergence times. I used restriction-site associated DNA

(RAD) sequencing to generate large genomic data sets, resulting in hundreds to thousands of loci and SNPs for each species. Divergence time estimates suggest that two of the obligate arthropods diverged synchronously with S. alata, roughly 180k years before present, and this community dispersed in an east-to-west manner. A third obligate arthropod is estimated to have dispersed across the river at a later time, with population genetic parameters supporting the same east-to-west direction. This is consistent with our knowledge of the origin and diversification of Sarracenia (Stephens et al. 2015), and demonstrates that some members of this ecological community dispersed across this major, but permeable, biogeographic barrier as a unit (Satler and Carstens, in review).

8

Finally, I summarize the results and findings from my dissertation, and discuss important information and conclusions from my research. In general, the results support the hypothesis that multiple species display ecological interactions that have been stable on evolutionary time scales. Evidence is demonstrated in shared spatial and temporal patterns, and highlights the S. alata pitcher plant system as an example of an evolutionary community. This dissertation provides insight into how ecological interactions can persist leading to patterns of co-diversification.

9

Chapter 2: Biogeographic barriers drive co-diversification within associated

eukaryotes of the Sarracenia alata pitcher plant system

2.1. INTRODUCTION

Dynamic processes during the Pleistocene epoch have been implicated as drivers of biological diversification (e.g., Hewitt 2000, 2004). Glacial cycles contributed to both landscape changes and climatic oscillations, providing strong abiotic factors that have led to speciation within many groups (e.g., Leaché and Fujita 2010; McCormack et al. 2010).

One region strongly influenced by these processes during the Quaternary is the southeastern United States, where decades of research has examined the structure of genetic variation in a diverse set of taxa (e.g., Avise et al. 1987; Avise 2000; Burbrink et al. 2000; Weisrock and Janzen 2000; Jackson and Austin 2010; Newman and Rissler

2011). Although glaciers never extended to this latitude, changes in both flow rate and direction of flow of major rivers coupled with fluctuations in sea level influenced phylogeographic patterns in this region (reviewed in Soltis et al. 2006). Specifically, major rivers in the region have produced population genetic structure in many clades, with the Mississippi River recognized as a well-characterized biogeographic barrier

(Brant and Orti 2003; Pyron and Burbrink 2009). The influence of landscape features coupled with the presence of large-scale barriers can be expected to isolate populations

10 within a species, especially those with limited dispersal abilities. Consequently, plants and that lack the ability to traverse large bodies of water are expected to exhibit substantial population genetic structure in this region.

Complex interactions that occur within ecological communities can influence the formation and maintenance of biodiversity. For example, numerous studies have shown how host plant diversification can contribute to the diversification of associated species, typically insects (e.g., Farrell and Mitter 1990; Wheat et al. 2007; McKenna et al. 2009;

Espindola et al. 2014). These include systems where plants evolve secondary compounds in an “escape and radiate” model of coevolution (Ehrlich and Raven 1964), and systems that include mutualist organisms such as plants and their pollinators. Such interactions can result in congruent demographic histories (e.g., Smith et al. 2011) and patterns of co- diversification (e.g., Rønsted et al. 2005). While it seems clear that the ecological interactions among plants and associated arthropods (e.g., herbivores and pollinators) can potentially drive patterns of co-diversification, it is unclear how host plants may influence other commensal organisms, particularly small eukaryotes. Communities of commensal organisms in both facultative and obligate relationships may be expected to show varying evolutionary patterns attributed to the level of dependency on the host plant. Given the dynamic and topologically complex landscape of the southeastern region, the study of ecological communities that span the breadth of host affinity, dispersal ability and life history traits can help inform how taxonomically diverse communities have assembled through time, and whether present day ecological associations extend into the deep past.

11

Phytotelmata—water bodies contained within living plants—provide an ideal system for investigating co-diversification within an ecological community because they are self contained and discrete units (Kitching 2000). Carnivorous pitcher plants are one such system, where decades of ecological work have documented a complex and distinct ecosystem associated with the pitcher fluid contained within the modified leaves. Pitcher plants in the Sarracenia (family Sarraceniaceae) contain a diverse microbiome, including groups such as bacteria, algae, protists, rotifers and arthropods (e.g., Folkerts

1999; Miller and Kneitel 2005; Peterson et al. 2008; Koopman et al. 2010). Their highly modified leaves form a trap that captures and digests prey items, while also providing a unique habitat for commensal organisms. Associated inquilines form complex relationships in the pitchers, with many supplying digestive enzymes that help break down decomposing prey items providing inorganic compounds for the plant (see

Adlassnig et al. 2011). A wide range of ecological work has investigated the communities associated with these plants, primarily in Sarracenia purpurea, showing community structure and interactions among the inquilines (e.g., Addicott 1974; Bradshaw and

Creelman 1984; Buckley et al. 2003; Gotelli and Ellison, 2006; terHorst et al. 2010;

Miller and terHorst 2012). Here, we focus on the Pale Pitcher Plant Sarracenia alata, a species distributed in patchy habitats along the gulf coast across eastern Texas, Louisiana,

Mississippi, and Alabama. This species is largely isolated from its congeners and occupies disjunct eastern and western regions across the Mississippi River (Fig. 2). Work by Koopman and Carstens (2010) identified population genetic structure in S. alata, and

Zellmer et al. (2012) showed that major rivers in the region promoted diversification

12 within the plant. Population divergence across either side of the Mississippi River is likely well into the Pleistocene, and estimated at greater than 120,000 years before present (Zellmer et al. 2012). Further analysis suggests that S. alata may contain two cryptic species, corresponding to populations on the eastern and western sides of the

Mississippi River (Carstens and Satler 2013). Sarracenia alata thus represents a particularly attractive system for investigating patterns of co-diversification, because the species exhibits strong genetic differentiation across the landscape, with significant divergence across an important biogeographic barrier (Soltis et al. 2006). In addition, longleaf pine savannahs in the south have seen a staggering amount of habitat loss in recent times (~1% of its original habitat remains; Noss 1989). High levels of cryptic genetic diversity highlight S. alata as a species of interest; identifying ecologically associated taxa with a shared evolutionary history has clear conservation implications.

Phylogeographic investigations of co-distributed taxa are usually limited to particular taxonomic groups (e.g., Bell et al. 2012; Fouquet et al. 2012; Smith et al. 2012;

Hope et al. 2014). While these studies can reveal evolutionary processes that produce patterns within biogeographic regions, the conclusions drawn from such findings can be limited by the shared life history traits that influence the formation of genetic structure

(e.g., dispersal ability, population size). Metagenomics provides a powerful approach for efficiently and rapidly sampling taxonomic diversity within a habitat (reviewed in Tringe and Rubin 2005), and thus may provide comparative phylogeographic investigations with an efficient approach to the sampling of taxa. Through the sequencing of environmental

DNA, communities of small to microscopic organisms can be directly sampled from the

13 environment resulting in the assemblage of a data set spanning a wide taxonomic breadth.

Thus, when coupled with next generation sequencing methods (Mardis 2008), metagenomics greatly increases the “taxonomic toolbox” lending itself well to investigations of comparative phylogeography. By analyzing a disparate assemblage of taxa comprising an ecological community, our work has the potential to reveal a shared response to historical events and thus evidence that evolutionary processes can shape community structure and interactions through time (Smith et al. 2011). With the diverse array of microscopic inquilines present within Sarracenia (e.g., Miller and Kneitel 2005), pitcher plants provide an ideal system for understanding how a host plant may influence genetic variation within an associated community, and metagenomics provides a tool for sampling this taxonomic diversity.

Here we explore the process of evolutionary diversification in an ecological community. We directly sample pitcher fluid from the modified leaves of S. alata, and apply a novel approach utilizing metagenomics to test if S. alata has influenced genetic structure in its eukaryotic commensal organisms. First, we characterize taxonomic diversity within the pitcher plant fluid to get an understanding of the major lineages and their abundance in this unique habitat. We then generate a comparative data set of OTUs which span the Mississippi River, and assess the degree to which the inquiline community shares population genetic structure with the host plant. We hypothesize that if eukaryotes associated with S. alata are ecologically dependent on the plant, then the evolutionary history of the commensals should exhibit population genetic structure largely congruent with that of S. alata. Alternatively, if taxa do not share an evolutionary

14 history with S. alata, community members should have unique population genetic structure indicating an idiosyncratic response to landscape processes driving diversification in the region.

2.2. MATERIAL AND METHODS

2.2.1. Genetic Sampling

Pitcher fluid samples were collected following Koopman and Carstens (2010) during the spring and summer of 2009 from part of the plant’s distribution. Specifically, samples were collected from 40 individuals across four locales (Abita Springs, Cooter’s

Bog, Kisatchie, Talisheek; Fig. 2) in Louisiana during June and August, as pitcher diversity peaks at this time (Koopman et al. 2010). In addition, fluid was collected from ten individuals per month for five months (April through August) from Lake Ramsey, resulting in 50 samples, for a total sampling effort of 90 individuals from five locales.

Sampling was originally designed to investigate both spatial (all five locales) and temporal (Lake Ramsey) dynamics, however, we focus on just spatial patterns in this study. DNA was extracted using the Powersoil DNA Isolation Kit (MO Bio). The large subunit 28S rRNA region was amplified for each fluid sample using the following primer combination (LS1F: GTACCCGCTGAACTTAAGC ; LS4R:

TTGTTCGCTATCGGTCTC; modified from Hausner et al. 1993), targeting a roughly

330 base pair (bp) region. Each pitcher fluid sample was labeled with MID tags to allow for multiplexing of individuals. PCRs were performed in triplicate and then pooled to prevent PCR bias, and subsequently sequenced on a 454 Life Genome

15

Sequencer FLX (Roche) at Engencore Genomics Facility (University of South Carolina,

Columbia) using 1/8th of a plate. Raw sequences were initially processed using Mothur

(Schloss et al. 2009) to sort sequences by individual, remove low quality reads, and identify unique sequences for each individual. Chloroplast data for S. alata was gathered from a previous study (see Carstens and Satler 2013 for details).

K ●

C ●

T ● L ● ● A

100 km

Figure 2. Sampling distribution of Sarracenia alata in Louisiana. Sample sites are partitioned based on side of the Mississippi River. Red circles represent Kisatchie (K) and Cooter’s Bog (C) in the west; blue squares represent Lake Ramsey (L), Abita Springs (A) and Talisheek (T) in the east. 16

2.2.2. Bioinformatics

To quantify the taxonomic diversity present within the pitchers, sequences were clustered into operational taxonomic units (OTUs) through de novo assembly.

Metagenomic studies commonly use de novo assembly for generating OTUs (e.g.,

O'Brien et al. 2005; Bik et al. 2012; Zimmerman and Vitousek 2012), and this allowed for a rough characterization of the number of taxonomic units present within the pitcher fluid.

Sequences were first combined within each of the sampling locales (i.e., we restricted clustering to those sequences collected from within each sample site), thereby treating each of the five sites as a separate population (see Fig. 2). Reads were trimmed to

275 bp, discarding any sequences below this threshold to remove potential bias associated with clustering samples of unequal sequence size. For consistency, we only analyzed sequences from Lake Ramsey collected at the same time periods as from the other sampling sites. Trimmed sequences were assembled into clusters using the

UPARSE algorithm (Edgar 2013); this pipeline been shown to outperform commonly used clustering methods such as Mothur and QIIME, and to work well under a solely de novo clustering approach. Within each locale, identical reads were collapsed and abundance values recorded (i.e., the number of times each unique read appeared in the data set). Sequences were then clustered based upon a 97% threshold, with the most represented sequences (based on abundance values) used to form initial OTU clusters,

17 using a dynamic programming algorithm to find clusters with the maximum score. The percent similarity threshold is subjective, but since it is required for de novo assembly, we justify this value by noting that (i) it was recommended by the author for de novo assembly in UPARSE (Edgar 2013), (ii) it falls within the range used to delimit fungi with this locus (see Sota et al. 2014 and references within), a group expected to be well represented within the pitcher fluid, and (iii) chimeric detection is increasingly difficult when this value is decreased. The clustering step in the pipeline (“cluster_otus”) uses

UPARSE-OTU, an algorithm that simultaneously determines the OTU clusters while removing chimeric sequences from the data set, a potential problem due to errors with pyrosequencing.

Following OTU clustering, a single sequence from each cluster was used with a

Basic Local Alignment Search Tool (BLAST) search to gather taxonomic identification for each of the clusters. Although there is a concern with the incompleteness of public databases, and that searches could return spurious matches (Koski and Golding 2001;

Tringe and Rubin 2005), at a higher taxonomic level (e.g., Class, Order), we can be reasonably confident that sequence matches reveal organismal affinity. In addition, for the purposes of this study, a qualitative assessment of higher level identification is sufficient to understand taxonomic diversity present within the pitcher fluid. A custom python script was used to search for taxonomic identities among the OTUs. For all

BLAST searches, sequences representing the centroid of the original OTU searches (in

UPARSE) were queried against the NCBI nucleotide database, using the Megablast search algorithm, saving the top hit from each search. OTUs with BLAST hits were

18 grouped by higher-level identification, generally at the level of Class or Order, to identify the variety of organisms present within the pitcher fluid. After summarizing taxonomic identity within the pitcher fluid at each site, rarefaction curves were generated with the package vegan (Oksanen et al. 2015) in R (R Core Team 2015), as a means to test if the taxonomic diversity had been adequately captured with our sampling efforts.

2.2.3. Population Structure

The major goal of this study is to identify OTUs that span the Mississippi River, and test if the landscape processes that have influenced diversification in S. alata have influenced the sampled organisms in a similar manner. To generate a comparative data set, all raw sequences were combined and OTUs were assembled with UPARSE following the steps outlined above (i.e., all sequences were clustered in a global analysis, regardless of sampling location). This data set included all sequences generated from

Lake Ramsey, as we were interested in collecting taxa with widespread distributions. If taxa were time dependent, they would be restricted to Lake Ramsey (during the months when only this locality was sampled) and removed following our filtering process (see below); however, taxa stable in these communities would comprise additional sequence information for comparative analysis. Following initial OTU clustering, the data set was reduced to those taxa that contained at least ten sequences per OTU with a minimum of three sequences on either side of the Mississippi River. These thresholds were used to maximize the number of OTUs represented in the final data set while still containing enough sequence data for statistical inference, both within and across sampling sites. In

19 addition, it is expected that any potential chimeric sequences not removed in the clustering step will fall below these thresholds, further reducing the potential for error with our final OTUs. Each OTU was aligned with MAFFT (Katoh and Standley 2013), using either the L-INS-i ( < 200 sequences) or FFT-NS-i ( > 200 sequences) algorithm.

To survey taxonomic diversity among the retained OTUs, a BLAST search was conducted on each of the OTUs following the same steps as outlined above.

Data were summarized within each of the OTUs in order to characterize genetic variation and quantify population genetic structure. Standard population genetic summary statistics (nucleotide diversity (π), Watterson's theta (Θw), and Tajima's D) were calculated with the package Pegas (Paradis 2010) in R. Several approaches were used to explore the partitioning of genetic variation among the OTUs. GST (Nei 1973) values were generated to estimate the degree of population differentiation among the locales, and were calculated with the R package gstudio (Dyer 2012). The level of genetic partitioning was assessed with an analysis of molecular variation (AMOVA; Excoffier et al. 1992), because the GST is an analog to FST values (Nei 1973). AMOVAs take into account the amount of variation in the sequence data, thereby extracting more information to determine the level of spatial structuring within the taxa. AMOVAs were calculated in the program SPADS (Dellicour and Mardulyn 2013), with 10,000 permutations to generate levels of significance. Hierarchical levels tested included (i) sampling locales within each region (i.e., side of the Mississippi River), (ii) sampling locales within total distribution, and (iii) between regions. In addition, the amount of allelic sorting on either side of the Mississippi River was calculated using the

20 genealogical sorting index (GSI; Cummings et al. 2008). This method is commonly applied to tests of taxonomic distinctness; it is applied here to quantify levels of lineage sorting within each side of the river, with higher levels of sorting suggesting greater population genetic structure indicative of a longer period of population isolation. GSI values range from 0 (no sorting) to 1 (monophyletic on either size of barrier), with p- values indicating the extent to which genetic structure recovered is more than would be expected by chance alone. An input genealogy is required to calculate the GSI; these were estimated using Maximum Likelihood (ML) with RAxML v7.2.8 (Stamatakis 2006;

Stamatakis et al. 2008). Depending on the number of sequences in the OTU, models of sequence evolution included either GTRCAT ( > 200 sequences) or GTRGAMMA ( <

200 sequences). Each ML tree was then input to the GSI web server, with 10,000 permutations to generate levels of significance. In addition, isolation by distance (IBD) values were calculated to see if there was a correlation between genetic and geographic distance, using the IBDWS v3.23 web server (Jensen et al. 2005). Genetic distance matrices were calculated using a Kimura 2-parameter (K2P) substitution model for each

OTU; geographic matrices were constructed measuring the Euclidean distance between sampling locales in kilometers with the distance measurement tool in Google Earth

(www.google.com/earth/, last accessed 18 July 2015). Finally, we used a chi-squared goodness of fit test to see if the number of OTUs with significant population genetic structure across the various analyses was more than would be expected by chance alone

(assuming α = 0.05). This allowed us to test the null hypothesis that there is no

21 correlation of population structure between the members of the eukaryotic community and the host plant.

2.3. RESULTS

2.3.1. Genetic Sampling

High-throughput sequencing resulted in a total of 26,399 sequences across 90 sampled pitchers. Following demultiplexing and quality control of samples, an average of

101 unique sequences were retained per pitcher (range: 12 – 199) for a total of 9,045 sequences. A fasta file containing all 9,045 sequences, as well as all OTU matrices from the comparative data set (see below), has been deposited at Dryad

(doi:10.5061/dryad.j3n0g).

2.3.2. Taxonomic Diversity

To remove biases associated with the clustering of length variable sequences, all sequences were trimmed to 275 bp (discarding any reads below this threshold), reducing the data set from 9,045 sequences to 8,991 sequences. Lake Ramsey contained a disproportionately larger percentage of the total number of sequences (49%); however, to compare samples collected from the same time periods, we only analyzed those samples from June and August, reducing the number of sequences from Lake Ramsey from 4,398 to 2,286, resulting in a total of 6,879 sequences. OTU clustering at the 97% sequence identity within each locale resulted in a median of 66 OTUs per sample site (324 total), ranging from 48 (Cooter’s Bog) to 82 (Lake Ramsey) total OTUs, with an average of 21

22 sequences per OTU when averaged across all sites. The majority of OTUs had a close hit in the BLAST search (97%), although a small number of OTUs (13) did not contain a match in the database (Fig. 3). Taxonomic diversity ranged across the tree of life, with many OTUs containing hits to fungi, and to a lesser extent, various arthropod groups, including insects and mites (Appendix A.3). In addition, numerous other groups were recovered in the searches, including protozoans, nematodes, an annelid, and even a vertebrate (Sus scrofa, wild boar). Rarefaction curves for each sample site suggest that

OTU diversity has not yet been reached, indicating that the pitcher plant community was not fully sampled in any of the sites (Fig. 4). Although fewer sequences per site likely prevented us from obtaining representatives from the full diversity of species within each pitcher, wider spatial sampling helped us achieve our goal of sampling a large number of eukaryotic species for a comparative data set (see below),

23

C (N = 48) A (N = 53) No Match Rhizaria Bacteria Plants Bacteria Amoebozoa

Acari Alveolata

Fungi Amoebozoa Insecta Fungi Chordata Bryozoa Nematoda Annelida

Acari Insecta K (N = 75) L (N = 82) Euglenozoa No Match No Match Amoebozoa Bacteria Aranaea Plants Alveolata Amoebozoa Acari Acari

Fungi Insecta

Collembola Fungi Insecta

Collembola Nematoda T (N = 66) No Match Bacteria Plants Amoebozoa

Mollusca Acari

Insecta Nematoda Fungi

Figure 3. Taxonomic composition of the OTUs for each sample site. Each site contains the number of OTUs (N) and the major lineages in which they belong. See Appendix A.3 for full taxonomic information.

24

Locality 80 L K

T 60 A C

40 OTUs

20

0

0 500 1000 1500 2000 Sequences

Figure 4. Rarefaction curves of OTU richness at each sampling site.

2.3.3. Population Structure

A global clustering effort was completed to generate a comparative data set for taxa that span the Mississippi River. As we were interested in widespread taxa, we used all sequences collected from Lake Ramsey—including those collected from additional time periods—resulting in the use of the full data set (8,991 sequences). Following de novo clustering, UPARSE produced 323 OTUs of which 65 contained a minimum of ten sequences and of these, 31 OTUs contained at least three representatives on either side of the river. BLAST hits of a single sequence from each of the 31 OTUs indicate that fungi and mites are the most well represented taxa (Table 1; see Appendix A.4 for BLAST 25 information). One OTU did not contain a significant BLAST hit, and with parameters relaxed, poorly matched a portion of the sequence to multiple disparate taxonomic groups. Since we detected it in multiple pitchers, it seems unlikely that this OTU represents a chimeric sequence. Given the incompleteness of taxonomic databases, however, we retained this OTU for downstream analysis, resulting in a final dataset of 31

OTUs (see Appendix A.1 for the sequencing distribution among locales). In this final set, the number of sequences per OTU ranged from 14 to 2,507, with a median of 54 (average of 225 sequences; Table 1).

Table 1. Taxa included in the final comparative data set. Information for OTUs include number of sequences (N), their nearest BLAST hit (except for S. alata), nucleotide diversity (π), Watterson’s theta (Θw) per site, Tajima’s D, and GST. Significance of GST and Tajima’s D (D following a beta distribution, rescaled to 0,1; Tajima, 1989) at α = 0.05 is indicated with an asterisk (*).

Taxa N BLAST π Θw Tajima's D GST Fungi1 52 Cladosporium sp. (Fungi) 0.0088 0.0244 -2.1585* 0.1099* Fungi2 51 Fusarium annulatum (Fungi) 0.0130 0.0299 -2.0211* 0.2335 Fungi3 22 Curvularia sp. (Fungi) 0.0157 0.0272 -1.7148 0.1419 Fungi4 2507 Candida saitoana (Fungi) 0.0059 0.0918 -2.6902* 0.0025 Fungi5 97 Candida saitoana (Fungi) 0.0072 0.0203 -2.0520* 0.0514 Fungi6 168 Candida saitoana (Fungi) 0.0090 0.0187 -1.6674 0.0167 Fungi7 84 Candida saitoana (Fungi) 0.0088 0.0217 -2.0016* 0.0711 Fungi8 57 Candida quercitrusa (Fungi) 0.0134 0.0341 -2.0901* 0.1150 Fungi9 30 Candida saitoana (Fungi) 0.0010 0.0053 -2.3512* 0.0991 Fungi10 54 Candida saitoana (Fungi) 0.0050 0.0183 -2.3958* 0.0788 Fungi11 189 Mucor circinelloides (Fungi) 0.0078 0.0307 -2.3694* 0.0090 Fungi12 14 Uncultured soil (Fungi) 0.0087 0.0100 -0.6488 0.2805 Fungi13 40 Uncultured fungus (Fungi) 0.0103 0.0125 -0.6810 0.2494 Fungi14 766 Fungal endophyte (Fungi) 0.0117 0.0586 -2.4374* 0.0052 Fungi15 15 Nigrospora sphaerica (Fungi) 0.0106 0.0188 -1.8028* 0.0451 continued 26

Table 1: Continued Amoebozoa1 227 Fuligo septica (Amoebozoa) 0.0042 0.0347 -2.6856* 0.0411 Alveolata1 18 Leptopharynx costatus (Alveolata) 0.0182 0.0324 -1.8094* 0.0418 Nematoda1 79 Nematoda sp. (Nematoda) 0.0150 0.0317 -1.8654* 0.4101 Nematoda2 154 Nematoda sp. (Nematoda) 0.0130 0.0174 -0.9456 0.2359 Nematoda3 21 Nematoda sp. (Nematoda) 0.0188 0.0170 0.2858 0.0624 Insect1 61 Brachymyrmex depilis (Insecta) 0.0122 0.0285 -2.0101* 0.0346 Insect2 37 Solenopsis xyloni (Insecta) 0.0208 0.0409 -1.8296* 0.0723 Insect3 41 Paratrechina hystrix (Insecta) 0.0081 0.0208 -2.1225* 0.2694 Mite1 828 Ovanoetus sp. (Acari) 0.0086 0.0620 -2.5727* 0.0150* Mite2 30 Ovanoetus sp. (Acari) 0.0152 0.0348 -2.1831* 0.2911 Mite3 1071 Anoetus sp. (Acari) 0.0071 0.0678 -2.6276* 0.0101* Mite4 56 Anoetus sp. (Acari) 0.0114 0.0242 -1.7765 0.0437 Mite5 34 Anoetus sp. (Acari) 0.0176 0.0219 -0.7551 0.1516 Mite6 50 Anoetus sp. (Acari) 0.0111 0.0197 -1.4951 0.0427 Mite7 45 Anoetus sp. (Acari) 0.0049 0.0098 -1.5594 0.0147 Unknown 66 No BLAST Match 0.0059 0.0198 -2.2971* 0.3539 Host plant 79 Sarracenia alata 0.0028 0.0034 -0.4521 0.8483*

A range of genetic variation is present in the sampled OTUs (Table 1). For example, estimates of nucleotide diversity (π) range from ~0.001 to 0.05, a fifty fold difference.

Tajima D values are negative for most taxa (median = -2.0101), with 21 of these values significant, indicative of an excessive number of segregating sites in the data sets.

Negative Tajima D values can be interpreted as resulting from a rapid demographic expansion, or from natural selection, on the marker itself or on a linked gene. This could also be the result of population structure in those OTUs, as collapsing separate populations can increase the number of segregating sites in a taxon. Among taxonomic groups, all fungi have a negative Tajima D value, with the majority (73%) being

27 significant. Of note are the Tajima D values for the arthropods, where all three insects have significantly negative values, and all seven mites have negative values, with three out of seven being significant.

There are varying levels of population structure across the taxonomic groups.

Roughly half of the fungi contain significant partitioning of genetic variance at the level of the sampling locale, with two taxa also significant at the level of locales within regions

(Fig. 5; Appendix A.2). Sequence-based F statistics display similar patterns, with GST values ranging from 0.003 – 0.280 (average GST = 0.101), suggesting population genetic structure is evident on either side of the Mississippi River in many taxa. Despite this structure, there is considerable sharing of alleles across the Mississippi River in the fungi, although some of the species contain greater sorting than would be expected by chance

(see GSI results; Fig. 5; Appendix A.2). Furthermore, genetic diversity in all but two of the fungi is not correlated with geographic distance (Appendix A.2). Results in the mites are similar, with roughly half of the taxa sampled showing a significant amount of genetic variation distributed among the locales, as well as across the Mississippi River

(Fig. 5, Appendix A.2). F statistics in the mites are slightly lower than those in the fungi

(average GST = 0.081). This structure is also evident in the GSI results, with more allelic sorting in most taxa higher than would be expected by chance (Fig. 5; Appendix A.2).

Patterns among the fungi (roughly half of the OTUs), mites and insects generally reflect those of the host plant, with the remaining taxa showing essentially no evidence for this shared genetic structure. Chi-squared goodness of fit tests show that more taxa share

28 population genetic structure with the host plant than would be expected by chance in many of the analyses (Table 2).

29

Figure 5. Population genetic structure for the inquiline community spanning the Mississippi River. Results are shown from the AMOVA and GSI analyses. AMOVA analyses show the hierarchical partitioning scheme of locales within regions (ΦSC), locales within total distribution (ΦST), and between regions (ΦCT). GSI analyses represent the amount of allelic sorting on the eastern and western sides of the Mississippi River. Dark cells indicate taxa with significant genetic structure at the corresponding level; Appendix A.2 contains specific values from each analysis. See Carstens and Satler (2013) for sampling information for S. alata, as these samples were collected from throughout the plant’s distribution.

30

Fig. 5

ĭ ĭ ĭ GSI GSI SC ST CT E W Fungi 1 Fungi 2 Fungi 3 Fungi 4 Fungi 5 Fungi 6 Fungi 7 Fungi 8 Fungi 9 Fungi 10 Fungi 11 Fungi 12 Fungi 13 Fungi 14 Fungi 15 Amoebozoa 1 Alveolata 1 Nematoda 1 Nematoda 2 Nematoda 3 1 Insect 2 Insect 3 Mite 1 Mite 2 Mite 3 Mite 4 Mite 5 Mite 6 Mite 7 Unknown Host plant

31

Table 2. A chi-squared goodness of fit test was used to measure if the number of taxa with significant population genetic structure was more than would be expected by chance alone. Under a null model we would expect a significant result 5% of the time (assuming α = 0.05). Results show that for many analyses, there are more OTUs with significant values than expected by chance, suggesting an association between many members of the community and the host pitcher plant.

Test χ2 df p-value Number Significant Total Taxa -6 ΦSC 22.3612 1 2.26 X 10 7 29 ΦST 80.8004 1 2.20 X 10-16 12 29 ΦCT 1.5263 1 0.22 0 29 GST 1.4278 1 0.23 3 31 -6 GSIE 20.1715 1 7.08 X 10 7 31 -4 GSIW 13.4482 1 2.45 X 10 6 31

2.4. DISCUSSION

Investigations into the evolutionary history of host plants and their associated insects have provided evidence for co-diversification over long time-periods (e.g.,

Weiblen and Bush 2002) in addition to demographic patterns that suggest a concerted response to abiotic factors over shorter periods of time (e.g., Smith et al. 2011). Inspired by such studies, we sampled a diverse set of organisms (representing similarly diverse ecological interactions) associated with the Pale Pitcher Plant in order to investigate the extent to which this ecological community has co-diversified. Within Sarracenia alata, previous work has demonstrated that populations are genetically structured across the landscape (Koopman and Carstens 2010). Major rivers are important drivers of diversification both in the region (Soltis et al. 2006) and for S. alata (Zellmer et al. 2012), and analysis of the barcode data presented here demonstrates that slightly under half of the eukaryotes sampled share similar population genetic structure with S. alata. 32

Our results show that a core eukaryotic community exhibits congruent patterns of population genetic structure, with many taxa displaying significant genetic structuring at the level of the sampling locality (based on ΦST); approximately half of the microscopic fungi and half of the mites are structured in a manner similar to that of the host plant (Fig.

5). Given the dispersal capabilities of fungal spores (e.g., Peay et al. 2008), this degree of population genetic structure is strikingly high (but see Taylor et al. 2006). Fungi are ubiquitous in terrestrial habitats, however, with many species associated with soils and plants. Fungal species have also been recovered from pitcher plant leaves, demonstrating their known presence within these microhabitats (reviewed in Adlassnig et al. 2011).

Multiple mite species from the family Histiostomatidae have been described from

Sarracenia pitcher plants (Hunter and Hunter 1964; Fashing and OConnor 1984), and act as prey consumers within the pitchers. Approximately half of the mites identified here exhibited population genetic structure similar to that of S. alata, reflecting structure seen both among sample sites and across the Mississippi River (Fig. 5). Other members of this core group include two of the three sampled insects (all three share a closest BLAST hit to ants), with general strong support across analyses for co-diversification. Ants comprise a large component of prey items for Sarracenia (Newell and Nastase 1998; Ellison and

Gotelli 2009), and one could interpret these results as being indicative of general biogeographic structuring in the longleaf pine savannah habitat. Such results are illustrative of the challenges associated with understanding if interactions among organisms have driven the shared responses to historical events reflected in patterns of co-diversification.

33

Although ants represent a major prey item of Sarracenia plants, our field work suggests that the plant does not specialize on any particular species of ant. Therefore, ants in general may be considered to have a relatively weak ecological association with the plant, and these results may highlight the strong influence that landscape and abiotic factors have on diversification in the region. Teasing these two interpretations (i.e., ecological association vs. landscape and abiotic factors) apart is non-trivial, yet an understanding of the strength of ecological association, habitat affinity, and dispersal ability can lend insight on this issue. Given the ecological associations shared among many of the inquilines with the host plant in this system, shared population structure does provide evidence that ecology plays a role in shaping diversification patterns through time. As the Mississippi River is an important evolutionary barrier to this system (and many groups across the region), diversification across the river may have taken place via the mechanism of oxbow lake formation, where changes in the river channel moved a portion of the habitat from the eastern to the western side of this barrier.

While slightly under one-half of the sampled taxa share population genetic structure with S. alata, there are other taxa with discordant evolutionary histories. Many of the fungal taxa exhibit little to no population structure, and we suspect that these microscopic species are widespread and not restricted to the pitcher plant bog habitats.

Their dispersal ability is likely to be higher than the larger members of this community, allowing them to escape the influence of biogeographic barriers. Other microscopic eukaryotes exhibit no evidence of population structure, including two protizoans and the sampled nematodes, suggesting that biogeographic barriers do not provide an obstacle for

34 long-distance dispersal in these taxa (Finlay 2002). In addition, one insect species demonstrates a lack of population structure. Further investigation of the BLAST result for this OTU (hit to Solenopsis xyloni in original search; see Table 1) indicates that this OTU is an identical match to the invasive red fire ant (Solenopsis invicta). Given the devastating impact and colonization power of the red imported fire ant, the lack of population structure is likely a product of their recent introduction to the southeastern

United States (from South America). Solenopsis invicta has grown explosively and displaced native species in the region (Porter and Savignano, 1990; Stuble et al. 2009) and the lack of structure recovered is consistent with the expectations of an invasive species. Clearly, ecological association and dispersal ability both play a role in the level of congruence detected in population genetic structure across species, although quantifying these two factors, especially dispersal for microscopic eukaryotes, remains a challenge.

Phylogeographic patterns within a species can be informative, but in aggregate, the results across many species make it possible to identify community responses to landscape changes. To date, phylogeographic researchers have not fully utilized metagenomics as a tool for increasing the taxonomic breadth of a comparative study. The

S. alata system is ideal for such studies, as each pitcher provides a self-contained and discreet habitat, where micro- and macroscopic organisms can live and persist in an ecological entanglement. The increased sampling facilitated by metagenomic approaches allowed us to identify a core evolutionary community within S. alata, and the simplest explanation for this congruence is that the core community has diversified in unison

35 because the constituent members are ecologically dependent on S. alata. As such, the

OTUs sampled represent an example of shared evolutionary patterns across an ecological community, and suggests that co-diversification is not limited to specialized interactions such as plants and pollinators. The recent discovery of cryptic diversity within S. alata

(Carstens and Satler 2013), together with the work presented here, highlights the need for conserving species like pitcher plants, which play a role in the survival of many different organisms. Such systems contain species that have been ecologically interdependent over evolutionary time scales, thus the loss of substantial diversity in the pitcher plant could lead to loss of diversity in its commensal species.

Given the power of comparative analysis for phylogeographic research, metagenomics can be leveraged to increase our knowledge of the evolutionary processes that lead to biogeographic patterns around the world. In particular, environmental sampling can provide access to taxa spanning the range of ecological and life history traits, as well as greater spatial sampling, which can provide evidence of the landscape processes that have structured species and communities in a region (Bermingham and

Moritz 1998). Potential pitfalls, however, do remain when applying metagenomics for such an analysis. In this study, a large number of sampled OTUs are fungi (Fig. 3), which could be indicative of their ubiquity in nature, but could also be due to our use of primers originally developed from fungal genomic resources. The need to isolate specific gene fragments with primers could have biased the taxonomic sampling, which may have also contributed to the non-asymptotic nature of the rarefaction curves, although this is more likely due to a relatively small number of sequences from next generation sequencing

36 with the sampling strategy used in this study. In addition, challenges exist when using de novo assembly for generating a taxonomic data set, particularly with the requirement of a percent threshold to determine the placement of sequences within OTUs. Although some values are commonly used for certain groups, it is unlikely that a single cutoff is appropriate across the tree of life. Further exploration of the correlation between sequence similarity and taxonomic identity across diverse groups is necessary to better place sequences with their proper OTU. However, as demonstrated here, metagenomic data can be beneficial for phylogeographic studies, with careful and transparent analysis of the data providing valuable insight into the diversification of a region, or in our case, an ecological community composed of a diverse set of lineages.

Remarkably, the co-diversification described here may extend beyond the eukaryotic members of this ecosystem. Koopman and Carstens (2011) provide evidence that the phylogenetic community structure in the bacterial microbiome reflects the population genetic structure of the plant. Since the bacterial microbiome is dominated by

Enterobacteriaceae (Koopman et al. 2010), a family commonly found in animal guts, it could be that the insect members of the core community facilitate colonization of bacteria in the pitchers (which are sterile before opening; see Peterson et al. 2008). If the core arthropods seed the pitchers with Enterobacteria, these microbes may produce enzymes that contribute to the digestive function of the pitcher. Since these complex ecological interactions have likely persisted for hundreds of thousands of years (based on estimates from S. alata), our work underscores the importance of investigating the evolutionary relationships of ecological communities.

37

2.5. PUBLICATION INFORMATION

This chapter previously published as: Satler JD, Zellmer AJ, Carstens BC (2016)

Biogeographic barriers drive co-diversification within associated eukaryotes of the

Sarracenia alata pitcher plant system. PeerJ, 4, e1576. (doi: 10.7717/peerj.1576).

38

Chapter 3: Phylogeographic concordance factors quantify phylogeographic

congruence among co-distributed species in the Sarracenia alata pitcher plant

system

3.1. INTRODUCTION

Comparative phylogeography is considered to be a powerful, bottom-up approach that can leverage information from many single taxon studies to identify broad-scale population genetic trends across geographic space (Bowen et al. 2014). Although intraspecific studies may elucidate evolutionary processes and phylogeographic patterns, these results may be species-specific and not reflect overall evolutionary processes driving species and genetic diversity within a region (Page and Hughes 2014). Therefore, phylogeographic research depends on comparative investigations to fulfill its potential as an integrative discipline linking population genetics and (Avise et al.

1987), as similar results across multiple species likely result from shared responses to historical events (e.g., Arbogast and Kenagy 2001; Hickerson et al. 2010; Dawson 2013).

In particular, Avise (2000) suggested that spatially congruent phylogeographic breaks among species be interpreted as shared responses to vicariance events, and the search for phylogeographic concordance has developed into a central objective of comparative phylogeography. The concept of phylogeographic concordance, however, has been

39 entirely qualitative to date. Just as two phylogenetic trees can be either identical in one way or different in endless numbers of ways, phylogeographers have described species histories as either concordant or as not concordant, without any explicit measurement of the degree of discordance. This shortcoming limits the ability of the discipline to identify species that have responded in a concerted manner to large-scale changes in the environment, because monophyly (required before concordance can be recognized based on a visual inspection of genealogy) is dependent on the effective population size (Ne) of the focal organism and requires substantial periods of time to form (Hudson and Coyne

2002). Furthermore, it is only identifiable with sufficient rates of nucleotide substitution, which may complicate comparisons between organisms with substantial differences in mutation rate, such as plants and arthropods.

Comparative phylogeographic investigations have typically focused on particular taxonomic groups that contain a number of co-distributed taxa (e.g., Garrick et al. 2008;

Bell et al. 2012; Fouquet et al. 2012; Oaks et al. 2012; Hope et al. 2014; Smith et al.

2014a), with a null model of species responding to historical events in an idiosyncratic manner. In these studies, shared phylogeographic breaks (sensu Swenson and Howard

2005) provide evidence that species responded to landscape-level changes in a similar way. Some comparative studies, however, have focused on ecological communities consisting of species from a variety of taxonomic groups (e.g., Whiteman et al. 2007; Roe et al. 2011; Espindola et al. 2014). Unlike other comparative phylogeographic studies, the expectation for investigations into ecological communities is for phylogeographic concordance across species, because the interacting members of an ecological community

40 are to some extent codependent and thus should respond in a concerted manner to landscape changes. Phylogeographic investigations that document shared demographic responses to climatic and environmental events provide evidence for the stability of certain ecological interactions in evolutionary time (Smith et al. 2011). Although ecological interactions have long been thought to influence species diversification (e.g.,

Darwin 1859), the lack of a quantitative measure of concordance has prevented assessment of the degree to which ecological interactions and phylogeographic concordance are correlated.

A novel approach for measuring phylogeographic concordance is enabled by species tree phylogenetic methods (e.g., Edwards et al. 2007; Kubatko et al. 2009; Heled and Drummond 2010). We follow in historical biogeography (sensu Nelson and Platnick

1981) by aggregating samples from particular geographic regions into operational taxonomic units (OTUs). With such OTUs, concordance factors (Baum 2007) can be used to quantify the degree of phylogeographic concordance and to identify community- level patterns of diversification among the species. Concordance factors were originally introduced to infer a concordance tree (i.e., a species tree) from multiple gene trees, under the assumption that the most represented clade in the gene trees is also present in the species tree. Concordance factors do not require any assumptions about the evolutionary factors that cause incongruence among gene trees, nor do they explicitly estimate parameters such as population size or divergence times. As such, they represent an elegant approach for quantifying congruent genetic structure across co-distributed species.

41

Concordance factors can be easily extended to comparative phylogeography, because recovered clades represent the proportion for which these relationships are true for a group of organisms. Phylogeographic concordance factors (PCFs) quantify one of the most important concepts in comparative phylogeography, concordance in the geography of phylogenetic breaks across co-occurring species (Type III concordance;

Avise 2000), and secondarily provide an estimate of the dominant pattern of diversification from multiple, co-distributed species. In the PCF framework, nodal support values (ranging from 0 to 1) designate the fraction of the community for which a particular clade is represented within the species’ history. Communities exhibiting ecological relationships that are stable through time should contain higher phylogeographic concordance factors (as measured by PCF scores that approach 1), demonstrating a shared pattern of diversification. Our expectation is that tightly associated species will respond to landscape and historical events in a concerted manner, and this will be reflected in similar phylogenetic patterns. Alternatively, communities with species that exhibit an idiosyncratic response to historical events would be expected to contain lower phylogeographic concordance factors. In this paper, we calculate phylogeographic concordance factors for an ecological community, conduct simulations to generate an expectation on the range of PCFs, and test the null hypothesis that ecological association is a predictor of phylogeographic congruence.

Ecological communities, defined here as an aggregate of co-distributed species that are associated ecologically, contain some set of species that interact for at least part of their life cycle. These ecological interactions can be facultative or obligate—for

42 example, species may be dependent upon other species for the acquisition of important nutrients, reproduction, or as hosts for a parasitic lifestyle. If these interactions are stable over long periods of time, species with obligate interactions should exhibit a high degree of phylogeographic concordance that may eventually manifest itself as a co-phylogenetic pattern (Johnson and Stinchcombe 2007). For example, figs and fig wasps are obligate mutualists that share an intimate relationship whereby the insect is the sole pollinator of the plant. Phylogenetic data for both groups support a pattern of co-diversification dating back 60 Ma (Rønsted et al. 2005). Host / parasite interactions can also be stable in evolutionary time, as best illustrated in pocket gophers and chewing lice (Hafner et al.

2003). Such studies demonstrate that ecological associations can extend into evolutionary time, and that complex processes that promote diversification can impact species in similar ways. If the evident co-phylogeny in these ecologically dependent species reflects co-diversification on the landscape scale, shared phylogeographic breaks on the landscape level should be common (e.g., Jousselin et al. 2008). By measuring the phylogeographic concordance at these early stages (i.e., on the landscape level), the implicit prediction of such co-phylogenetic studies can be tested: that genetic concordance and the degree of ecological interaction are positively correlated.

An ecological community well suited for comparative phylogeographic research is the Sarracenia alata carnivorous pitcher plant system. Sarracenia alata inhabits longleaf pine savannahs and fens in the Gulf Coast of North America (McPherson 2007).

Their leaves form as hollow, water-filled structures that enable the plant to capture and digest prey items, an adaptation for nutrient acquisition (Darwin 1875). These pitchers

43 also comprise a unique ecosystem, where a broad diversity of associated organisms share complex interactions and relationships enabling the plant to act as host for a diversity of species (Adlassnig et al. 2011). Within S. alata, a diverse group of arthropods are known to interact ecologically with the plant, and can be categorized as obligate symbionts, herbivores, or capture interrupters (reviewed in Folkerts 1999). Obligate symbionts include several flesh fly species (family Sarcophagidae); these flies use the pitchers as an environment for larval deposit and development, with the plant providing both shelter and prey items during this life stage (Dahlem and Naczi 2006). Two described species of mites inhabit the pitchers; one is a scavenger in the pitcher fluid (Sarraceniopus hughesi), and the other feeds on nematodes and microscopic arthropod species (Macroseius biscutatus). The entire life cycle of the noctuid moth, Exyra semicrocea, occurs within the pitcher tubes, with the larval stage feeding upon the inner tissue of the leaf. A recent investigation into this moth utilizing mitochondrial DNA recovered a complete lack of haplotype sharing across the Mississippi River, and additional population structure east of the river (Stephens et al. 2011). In addition, multiple species of spiders act as capture interrupters, and opportunistically capture prey items that are attracted to the pitcher leaves. Although not restricted to the pine savannahs, the spiders live in close proximity to S. alata in these habitats.

44

Figure 6. Distribution of Sarracenia alata in the southeastern United States, with major rivers represented on the map. Broken lines indicate the known distribution of S. alata. Circles correspond to sampling sites for the study, with pattern of circles indicating the two hypothesized evolutionary lineages of the host plant (Carstens and Satler 2013). Locales are as follows: Sundew (S), Pitcher (P), Bouton Lake (B), Red Dirt (R), Cooter’s Bog (C), Kisatchie (K), Lake Ramsey (L), Talisheek (T), Abita Springs (A), De Soto (D), Franklin Creek (F), and Tibbie (Tb). Taxa varied in the number of locales in which they are represented; details on the sampling distribution of each taxon can be found in Appendix B.10 Insert picture is of S. alata from the eastern lineage.

Sarracenia alata and its arthropod community offer an ideal system for an exemplar community phylogeographic study. Population structure has been demonstrated at both shallow and deep levels within S. alata across its distribution (Zellmer et al. 2012;

Carstens and Satler 2013), with congruent population genetic structure exhibited across multiple taxonomic groups recovered from the pitcher plant fluid (Satler et al. 2016). But 45 apart from E. semicrocea (Stephens et al. 2011), little is known regarding the phylogeographic structure of the associated arthropod species. Here we generate DNA sequence data from six arthropod species to investigate the evolutionary history of the S. alata pitcher plant system. We predict that ecological relationships correlate with phylogeographic congruence, use phylogeographic concordance factors to quantify the amount of phylogeographic congruence among members of the community, and identify species that do not share an evolutionary history with the pitcher plant system.

3.2. MATERIAL AND METHODS

3.2.1. Taxon Sampling

Samples from six arthropod species known to interact ecologically with S. alata were collected from throughout S. alata’s distribution (Fig. 6). There were 12 total sampling sites; the sampling, however, varied across the locales (see Appendix B.10 for sampling information). Arthropods included two species of flesh flies (Fletcherimyia celarata, Sarcophaga sarraceniae), one moth (Exyra semicrocea), one mite

(Macroseius biscutatus), and two spiders (Misumenoides formosipes, Peucetia viridans).

All individuals were captured in or near the pitcher plant, with most resting on or just under the lid of the pitcher. For the two flesh fly species, males were pinned in the field with their genitalia extracted to confirm proper identification. For those fly specimens, three legs were removed and placed in 95% EtOH for DNA preservation; all other specimens were placed directly in 95% EtOH. Upon returning to the lab, all samples in ethanol were transferred to a -80° degree freezer for sample preservation. DNA was

46 extracted from either crushed legs (flies, spiders) or full body soaking (moth, mite) using a Qiagen DNeasy kit; resulting samples were amplified for the mitochondrial cytochrome c oxidase subunit I (COI) barcode gene (Folmer et al. 1994) via polymerase chain reaction (PCR), with subsequent products sequenced in both directions. Sequences were edited using Sequencher v4.8 (Gene Codes Corporation, MI), and manually aligned using

MacClade v4.08 (Maddison and Maddison 2005). Previously sequenced chloroplast and nuclear DNA (eight loci) was used for S. alata (Koopman and Carstens 2010; Zellmer et al. 2012; Carstens and Satler 2013).

3.2.2. Population Genetic Structure

Multiple summary statistics were generated to characterize genetic variation in the organellar genomes of the arthropod species. The packages ape (Paradis et al. 2004) and pegas (Paradis 2010) in the statistical platform R (R Development Core Team 2013) were used to calculate standard population genetic statistics, including π and Watterson’s θ.

GST values (Nei 1973) were calculated for each species in gstudio (Dyer 2012); these values measure the level of genetic partitioning among the sampling locales. As this takes into account haplotype data, an analysis of molecular variance (AMOVA; Excoffier et al.

1992) was estimated for each of the species to take full advantage of the sequence data.

This was tested in a hierarchical manner, assessing genetic variation for (i) sampling locales within each side of the Mississippi River, (ii) sampling locales in the total distribution, and (iii) regions (i.e., east and west of the Mississippi River) within the total distribution. AMOVAs were calculated in the program SPADS (Dellicour and Mardulyn

47

2014), with 10,000 permutations to generate levels of significance. Spatial principal component analysis (sPCA) in adegenet (Jombart 2008) and redundancy analysis (RDA) in vegan (Oksanen et al. 2007) were run to test how sampling location influenced genetic partitioning in the data, and to test if genetic breaks coincide with the Mississippi River.

To test for a correlation between geographic and genetic distance, isolation by distance

(IBD) values were calculated with the IBDWS v3.23 web server (Jensen et al. 2005).

Geographic matrices were constructed measuring the Euclidean distance between sampling sites (in kilometers) with the distance measurement tool in Google Earth

(www.google.com/earth/, last accessed 27 Oct 2015); genetic distance matrices for each species were generated using a Kimura 2–parameter (K2P) substitution model. In addition, maximum likelihood (ML) estimates of the gene tree topologies were estimated using Garli v2.01 (Zwickl 2006). Models of DNA sequence evolution were estimated in jModelTest v0.1 (Guindon and Gascuel 2003; Posada 2008). For each Garli analysis, data sets were partitioned by codon position, with 1000 bootstrap replicates to generate nodal support values. Support values were summarized on each tree with SumTrees v4.0.0

(Sukumaran and Holder 2015) from the python library DendroPy v4.0.3 (Sukumaran and

Holder 2010).

3.2.3. Phylogeographic Concordance Factors

Phylogeographic concordance factors (PCFs) are calculated in a three-step process: (i) posterior distributions of species trees are estimated for each species independently, with all OTUs (i.e., tips in the trees representing geographic areas)

48 represented across species, (ii) the frequency of each unique topology (relationships only) within each species tree distribution is summarized, and (iii) tree distribution summaries are combined across all species to generate a concordance tree. Nodal support values of the resulting concordance tree are the phylogeographic concordance factors, with the tree pattern representing the estimate of community diversification.

Sarracenia alata pitcher plant system

PCFs were calculated to test for a correlation between ecology and phylogeography, infer the dominant pattern of community diversification, and identify the species that share a congruent phylogeographic history with the host plant. Zellmer et al. (2012) proposed that habitat isolation facilitated by large rivers is the primary cause of population genetic structure in S. alata. This supports the idea that sampling locales within these regions were connected by suitable habitat prior to European settlement and subsequent land conversion and habitat loss (Noss et al. 1995). In order to calculate

PCFs, we follow Zellmer et al. (2012) and define OTUs as the geographic regions divided by these major rivers. Specifically, sample locales were collapsed into the following five OTUs: Bouton Lake (B) + Sundew (S) + Pitcher (P); Cooter’s Bog (C) +

Red Dirt (R); Abita Springs (A) + Lake Ramsey (L) + Talisheek (T); De Soto (D); and

Franklin Creek (F) + Tibbie (Tb) (see Fig. 6 for details); samples were not included from

Kisatchie (K) due to incomplete sampling of arthropod taxa. In addition, M. formosipes and F. celarata were not included as they did not contain sufficient sampling, leaving five species for the PCF analysis. Species tree distributions for each species were

49 estimated with *BEAST v1.8.2 (Heled and Drummond 2010), associating alleles from each locale with their respective OTU. COI data sets were partitioned by codon position, with models of DNA sequence evolution as estimated above. Although single locus data are not ideal for species tree estimation, and we do not advocate this application absent a framework such as PCFs, mitochondrial DNA has been the marker of choice for phylogeography since the inception of the discipline, and in many cases does a reasonable job of tracking population history. Each individual analysis was performed under a strict molecular clock for 2.0 X 108 generations, 10,000 trees saved in the posterior distribution, and the first 10% of trees discarded as burn-in. For S. alata, an eight-locus data set was used (one chloroplast and seven nuclear loci), including

Sarracenia rubra as an outgroup for rooting purposes (data from Carstens and Satler

2013), and run for 5.0 X 108 generations (10,000 trees saved and 10% discarded as burn- in). Log files were imported into Tracer v1.6 (Rambaut et al. 2014) to check for convergence. Each species tree distribution was summarized using mbsum (Larget et al.

2010) to count the number of times each unique topology was represented within the posterior distribution. BUCKy (Ané et al. 2007) was then used to process the resulting tree summary files, generating a concordance tree with concordance factors. An alpha value of infinity was used, which represents a null hypothesis of no congruence. This approach is conservative, and thus requires strong support across two or more species before an inference of shared phylogeographic history is reached (based on PCF values).

A custom python script (PCFs.py) was written to conduct steps ii and iii of the pipeline

(https://github.com/jordansatler/PhylogeographicConcordanceFactors).

50

If one or more species has not evolved with the community, it would be desirable to identify those taxa. To determine if a subset of taxa best fit a model of co- diversification, all possible combinations of N taxa were analyzed using the pipeline, ranging from pairing just two species together to N - 1 species. To compare the suite of models, average nodal support values (i.e., the average value of their community tree concordance factors) were calculated and partitioning schemes within each K level were ranked based on this value. Essentially, we would expect a dramatic increase in average

PCF values if taxa that show discordant phylogeographic histories were removed from the analysis. Although there may be a hierarchical structure of expectations for PCF values (i.e., the deepest part of the trees are split by a biogeographic barrier, resulting in a high PCF value for that node, but there is relatively little substructure towards the tips in the trees, resulting in low PCF values for those nodes), averaging these values provides a way to quantitatively compare the various taxon partitioning schemes and identify potential taxa that do not fit within a community model.

Simulations for pairwise comparisons

Simulations were conducted to generate expectations of PCF values in pairwise comparisons (K = 2). 20 species trees were simulated under a Yule process with five

OTUs and a total tree depth of 20N; trees varied randomly in regards to both interrelationships and branch lengths. Ten genealogies were simulated from each species tree with ms (Hudson 2002), and DNA sequence data were evolved from these genealogies using Seq-Gen (Rambaut and Grass 1997) under an HKY model. After

51 generating the sequence data, each simulated DNA matrix was treated as an individual

(analogous to the COI data from each arthropod), and species tree distributions were estimated using *BEAST following conditions stated above. In total, the analysis produced 200 species tree distributions, with ten distributions per species tree topology.

PCFs were calculated in a pairwise manner (i.e., the K = 2 level), testing taxon pairs that were generated from the same species tree. Average PCF values were summarized to generate a distribution of expected values when species shared the same phylogeographic history. Simulations therefore take into account coalescent stochasticity and species tree estimation error.

Proof of concept in other systems

We applied the PCF method to two additional systems, varying in ecological associations among the included species. We analyzed Sanger sequence data described in

Carstens et al. (2005) from the Pacific Northwest (PNW) temperate rainforest, and ultraconserved DNA element (UCE) data described by Smith et al. (2014b) from

Neotropical birds. The former included three amphibians (one frog and two salamanders), a small mammal (vole), and a tree (willow) distributed across the Cascade and Rocky

Mountains (see Carstens et al. 2005 for details). Species tree distributions were estimated with *BEAST (as described above) from a combination of single locus (frog,

Dicamptodon, vole, willow) and multilocus (Plethodon) data sets, with four geographic regions constituting the OTUs. The latter included next-generation sequence data from

Smith et al. (2014b), consisting of four bird species spanning three biogeographic barriers

52 in South and Central America (see Smith et al. 2014b for sampling details). We included four of the five species from the investigation, as the fifth species was not sampled from all geographic regions. Schiffornis turdina contained one region that Smith et al. (2014b) divided into two, but since these two regions formed a clade with pp = 1.0, we collapsed the node and treated the two areas as a single region in our analysis. Species tree distributions were provided by B. Smith (those analyzed in Smith et al. 2014b); each species tree distribution was estimated with between 115 and 144 UCE loci (varying by species). PCF analyses for both systems were run as outlined above.

3.2.4. Tests of Simultaneous Divergence in S. alata System

The msBayes model (Hickerson et al. 2006, 2007; Huang et al. 2011) was used to test for simultaneous divergence across the Mississippi River among the community members. In contrast to PCFs, msBayes evaluates the timing of divergence among population pairs to understand if a model of simultaneous divergence best explains the data. Given the complexity of the model, a full likelihood approach is computationally intractable and hierarchical Approximate Bayesian Computation (hABC) is used. The msBayes method estimates hyperparameters that are of interest to the researcher, including posterior probabilities on the number of divergence episodes, mean and variance of the timing of the divergence episodes. An appealing aspect of this method is it takes into account both mutational rate variation and coalescent stochasticity within each species, so lineage specific aspects of their molecular evolution are accounted for when estimating if the timing of divergence is synchronous.

53

We used an isolation only model where the Mississippi River divides the community and migration ceases following this split. The Mississippi River is a well- characterized biogeographic barrier (reviewed in Soltis et al. 2006), with a lack of suitable habitat for S. alata in the Atchafalaya Basin leading to a disjunct distribution in the plant. Research on S. alata shows the deepest split in the population tree corresponding with the Mississippi River (Zellmer et al. 2012), with populations restricted to either side of the Mississippi River representing evolutionary lineages

(Carstens and Satler 2013). In addition, four of the six arthropods show reciprocal monophyly in their COI genealogies, as well as population genetic structure in spatial analyses corresponding with the Mississippi River (see Results).

We applied the updated hABC model implemented in PyMsBayes (Oaks 2014), with all species (rps16–trnK gene used for S. alata) included in the analysis. PyMsBayes has multiple changes that allow for improved accuracy with the msBayes model; primarily, rather than placing a uniform prior distribution on the number of divergence episodes, PyMsBayes places a Dirichlet process prior on all of the divergence models, with gamma prior distributions used on many parameters, including population divergence (τ) and population size (θ). The divergence episode prior distribution was centered around three divergence episodes, as this represents our prior hypothesis of diversification; the two spider species are characterized as capture interrupters, and do not show any obligate association with the plant. In contrast, given the life history traits of the other arthropod species, we may expect the ecologically associated species to show similar divergence patterns to S. alata. Population size parameters were either estimated

54 individually (ancestral and both daughter populations) or as one (all populations given the same population size parameter). Reciprocal monophyly in a single locus data set creates difficulty in estimating both timing of population divergence and ancestral population size, as the lack of incompletely sorted alleles leaves little information to the size of the ancestral population (Edwards and Beerli 2000). Given that four of the seven species do show this pattern (see Results), difficulty in estimating ancestral population size could lead to inaccuracy in the estimation of divergence times; restricting population sizes to a single parameter may help combat this issue. The prior distribution on divergence times had an upper limit at the Pleistocene / Pliocene change, with much of that distribution less than 1 Ma, consistent with divergence estimates of S. alata (Zellmer et al. 2012). As

S. alata is the host for the ecological community, we informed prior choices based on our knowledge of the evolutionary history of the plant. Generation times and mutation rates were modeled independently in each species; specific details on the PyMsBayes configs files can be found in Appendix B.16. Analyses were run for 3 X 106 generations, with

1,000 samples drawn to approximate the posterior distribution. The simulated and empirical data were compared using the default combination of summary statistics, with the unadjusted posterior estimate used to represent the posterior distribution (Wegmann et al. 2010). The vector of the summary statistics was not re-ordered, preserving the original taxon order of the observed data as suggested by Oaks (2014).

3.3. RESULTS

3.3.1. Genetic Sampling

55

Between 21 and 138 COI mtDNA sequences were generated for each arthropod species (Table 3). All newly generated sequences have been deposited in GenBank under accession numbers KU975801 – KU976138 (see Appendix B.10 for more details).

Sequences from the rps16–trnK chloroplast gene (Koopman and Carstens 2010) and seven anonymous nuclear loci (Zellmer et al. 2012; Carstens and Satler 2013) were gathered from previous studies for the analysis of S. alata. Data sets and input files have been deposited in Dryad (doi:10.5061/dryad.475t2).

Table 3. Sampling and sequencing information for all arthropod species. Information includes the number of sequences (N) and unique haplotypes (H) per species, segregating sites (SS), nucleotide diversity (π), Watterson’s θ per site (θw), Tajima’s D, and GST values. Asterisks indicate significant p-values below 0.05.

Species N H SS π θw Tajima's D GST Sarcophaga sarraceniae 157 13 16 0.0059 0.0046 0.7060 0.4774* Fletcherimyia celarata 33 4 7 0.0028 0.0028 0.0105 0.5859* Exyra semicrocea 37 19 31 0.0142 0.0117 0.7506 0.4453* Macroseius biscutatus 21 13 71 0.0515 0.0309 2.4239* 0.6146* Peucetia viridans 56 18 28 0.0041 0.0096 -1.8789 0.0834 Misumenoides formosipes 34 28 38 0.0086 0.0149 -1.5251 0.2927

3.3.2. Population Genetic Structure

Genetic diversity varied across sampled taxa (Table 3). Some species exhibited high numbers of unique haplotypes (e.g., M. biscutatus, 13 out of 21), while others had low haplotype diversity (e.g., F. celarata, 4 out of 33). Nucleotide diversity ranged from about 0.003 to 0.05, and Watterson’s θ ranged from 0.003 to 0.01. Values of Tajima’s D

56 varied from -1.88 to 2.42, with one taxon containing a significant D statistic

(M. biscutatus, 2.42). Population structure is evident in multiple taxa within the data set.

GST values are high, with an average of 0.478 (Table 3). Except for the spider species,

GST values from all species are statistically significant, with values greater than 0.445.

Results from the AMOVAs (Table 4) are similar, with significant genetic structuring in a minimum of one level for all species with the exception of the two spiders, where no structure was detected; the moth and mite show significant population structure at all three levels. Population structure is also recovered in the sPCA (Appendix B.1) and RDA analyses (Appendix B.11) for the non-spider arthropods. Limitations in dispersal ability and ecological associations with the host plant are likely contributing to the high levels of population genetic structure. This pattern is further reflected in the COI gene tree estimates (Appendix B.2–B.7). Reciprocal monophyly is detected in all species minus the spiders, and highlights the Mississippi River as a biogeographic barrier influencing genetic variation in many of these species.

Table 4. AMOVA and IBD results. Asterisks indicate significant p-values below 0.01. Isolation by distance was not calculated in M. formosipes due to limited sampling in multiple localities.

Species AMOVA IBD 2 ΦSC ΦST ΦCT r RMA slope r Sarcophaga sarraceniae 0.0117 0.9362* 0.9355* 0.8635* 0.00319 0.746 Fletcherimyia celarata -0.1879 0.9690* 0.9739 0.8087 0.00564 0.654 Exyra semicrocea 0.4548* 0.9059* 0.8274* 0.6782* 0.00242 0.460 Macroseius biscutatus 0.4911* 0.9782* 0.9571* 0.6034* 0.00258 0.364 Peucetia viridans -0.0148 -0.0125 0.0023 0.0478 0.00078 0.003 Misumenoides formosipes -0.0406 -0.0427 -0.0021 NA NA NA 57

Strong support values are seen in the trees, although finer-scale structure is less present in some of the taxa. Overall, the spiders show essentially no population genetic structure, a result that may reflect their ability to disperse over large distances (e.g., Greenstone et al.

1987).

3.3.3. Phylogeographic Concordance Factors

Results from S. alata system

Pairwise comparisons between each arthropod species and the host plant demonstrate that ecological association is a good predictor of phylogeographic congruence (Fig. 7). In particular, the three species known to interact ecologically with

S. alata have average PCF values above 0.71 (highest between the mite and host plant at

0.79), while the average PCF value for the spider and plant is 0.45. To assess the significance of these values, we analyzed simulated data from 900 pairwise comparisons for taxa where each species pair was from the same model of diversification (i.e., the same species tree); these results provided a benchmark for interpreting the PCF values calculated from the empirical data (Appendix B.8). The 90% highest posterior density interval for PCFs (90% HPD = 0.71 – 0.99) suggests that the values observed in the species that interact ecologically with the plant are indicative of co-diversification. The value for the spider, on the other hand, is sufficiently low as to suggest no shared history of diversification. This result holds when the 95% highest posterior density, a more inclusive distribution, is used (95% HPD = 0.62 – 0.99).

58

When PCFs are calculated from sets of more than two species, they provide evidence for a shared phylogeographic history among the members of the community.

With all five species included, fairly strong support emerges for a shared east / west split, with concordance factors of 0.79 for a western clade and 0.77 for an eastern clade (Fig.

8A). A concordance factor of 0.34 shows less support for a clade uniting ALT and D, suggesting that phylogeographic relationships within the eastern region are less concordant across the species. Results from conducting all permutations of taxa at all K levels (greater than one) provide evidence that P. viridans has not co-diversified with the community (Table 5). For example, when all permutations are run at K = 4 (dropping one species each time), average nodal support values are substantially higher when the spider is not included. At the K = 3 level, all permutations not including P. viridans again show a substantially higher PCF average than any community compositions including the spider. This pattern is also evident at the K = 2 level, indicating that P. viridans has not co-diversified with the community. Removing the outlier species from the analysis returns the same community tree (Fig. 8A), but increases nodal support values for the western (to 0.92) and eastern (to 0.95) clades. As further substructure in the east is not well supported (increase to 0.42), PCFs suggest that response to the Mississippi River has driven the shared phylogeographic patterns seen in this community. The pattern can be visualized with the maximum clade credibility (MCC) trees from each of the species, where two distinct groups are seen (Fig. 8B).

59

0.9

M. biscutatus 0.8

S. sarraceniae 0.7 alue E. semicrocea V PCF 0.6

0.5 P. viridans 0.4

Herbivore Predator Capture Interrupter Prey Consumer Generalist Obligate

Figure 7. Phylogeographic concordance factors between each arthropod species and the host pitcher plant. Pairwise comparisons represent the amount of congruence between the species’ posterior distributions of species trees (see Table 5 for specific PCF values). Higher host-plant association correlates with a higher average phylogeographic concordance factor. Ecological categorical types from Folkerts (1999).

60

Table 5. Phylogeographic concordance factors for all permutations of K levels from two to N total species. Average PCF values are calculated by taking the average of the concordance values at all ingroup nodes. For each K level, the taxonomic compositions are sorted to show the species groupings with the highest value of phylogeographic concordance. Taxon names have been truncated.

K PCFaverage Species Composition 5 0.63 M. biscu, E. semic, P. virid, S. alata, S. sarra 4 0.77 M. biscu, E. semic, S. alata, S. sarra 4 0.62 M. biscu, E. semic, P. virid, S. sarra 4 0.62 M. biscu, P. virid, S. alata, S. sarra 4 0.59 M. biscu, E. semic, P. virid, S. alata 4 0.59 E. semic, P. virid, S. alata, S. sarra 3 0.80 M. biscu, E. semic, S. sarra 3 0.79 M. biscu, S. alata, S. sarra 3 0.75 M. biscu, E. semic, S. alata 3 0.73 E. semic, S. alata, S. sarra 3 0.61 M. biscu, P. virid, S. sarra 3 0.57 M. biscu, E. semic, P. virid 3 0.56 M. biscu, P. virid, S. alata 3 0.56 P. virid, S. alata, S. sarra 3 0.55 E. semic, P. virid, S. sarra 3 0.54 E. semic, P. virid, S. alata 2 0.85 M. biscu, S. sarra 2 0.80 M. biscu, E. semic 2 0.79 M. biscu, S. alata 2 0.77 S. alata, S. sarra 2 0.74 E. semic, S. sarra 2 0.72 E. semic, S. alata 2 0.51 M. biscu, P. virid 2 0.47 P. virid, S. sarra 2 0.45 P. virid, S. alata 2 0.45 E. semic, P. virid

Results from other systems

We analyzed comparative data sets in two additional systems with the PCF framework. In the PNW, results are consistent with the basic patterns described by 61

Carstens et al. (2005), and reinforce previous assertions of phylogeographic congruence

(PCF ≥ 0.98) among the three amphibian species (Fig. 9A). PCFs identify the vole as an outlier to the community-level pattern, as pairwise comparisons that include the vole have low average PCF values (~0.5; Fig. 9B). PCF values also decrease when the willow tree is included with the amphibians (Appendix B.12). In the Neotropical bird data set, results show generally high levels of phylogeographic congruence among the species

(Fig. 9C; Appendix B.13), with all pairwise comparisons above 0.75 (Fig. 9D). This demonstrates relatively similar topological patterns among the species (see Smith et al.

2014b for divergence time comparisons among species).

62

Figure 8. Community pattern of diversification estimated with phylogeographic concordance factors. Panel A represents the community tree estimated in BUCKy. Concordance factors on top reflect when all five species are used; concordance factors on bottom are with P. viridans removed from the PCF analysis. Panel B represents the maximum clade credibility (MCC) trees of the *BEAST results from each species, providing a visual perspective of the two recovered groups. Posterior probability values greater than 0.95 are represented by an asterisk. Species names have been truncated for illustration purposes.

3.3.4. Tests of Simultaneous Divergence in S. alata System

Results from PyMsBayes suggest there have been multiple divergence episodes in this system (Table 6). The posterior distribution, however, is similar to the prior distribution, suggesting a lack of information present in the data. These results do not appreciably change when effective population sizes are treated as equal. To further explore the sensitivity of the analyses to the priors, the prior distribution on divergence 63 episodes was adjusted to reflect a community where nearly all species diverged idiosyncratically (centered around six divergence episodes). Results from these analyses—one containing a different population size for each population, the other with the same parameter for all three populations—show posterior distributions once again similar to the prior distributions (Appendix B.14). While there is little to no support for a single episode of divergence, it appears the data are insufficient for such a parameter-rich model.

Table 6. Results from PyMsBayes analyses. Prior distribution is centered around three divergence episodes, consistent with our prior beliefs on the system. Posterior distributions are very similar to the prior distribution, regardless of how population sizes (ancestral and daughter) are treated.

Divergence Episodes Unique models Prior θA ≠ θD Posterior θA = θD Posterior 1 1 0.1170 0.1190 0.1550 2 63 0.2710 0.2910 0.3100 3 301 0.3098 0.2860 0.2930 4 350 0.2020 0.2030 0.1730 5 140 0.0776 0.0770 0.0610 6 21 0.0202 0.0210 0.0050 7 1 0.0024 0.0030 0.0030

64

A) C)

0.66 0.68 1.0 0.63

NRMn NRMs NC SC CA CH NA SA B) D) A C A A 1.0 A 1.00 U A A 0.9 0.95 0.8

0.90 C C C A A U C U 0.7 A 0.85 P P

PCF Value P PCF Value 0.6 0.80 A A A P U C 0.5 M M 0.75 M M U U 0.4 0.70 5 9 2 8 7 4 10 1 6 3 6 4 2 3 1 5 Model Model

Figure 9. Phylogeographic concordance factors estimated in two additional data sets: Pacific Northwest and South and Central America. Panels A and B are results from Carstens et al. (2005); Panels C and D are results from Smith et al. (2014b). Panel A shows the community tree for the PNW, with the deepest split between the Cascade and Rocky Mountains, and moderate support at both nodes. Panel B shows all pairwise comparisons; all models containing two amphibians (A) show essentially identical phylogeographic patterns, while models containing the small mammal (M) reflect high levels of discordance. Panel C shows all four bird species contain a CA, CH sister relationship, with less support for an NA, SA relationship. Panel D shows all pairwise comparisons have a minimum average PCF value of 0.75, reflecting relatively high concordance of topological relationships among all bird species. Birds are categorized as living in the canopy (C) or understory (U); see Smith et al. (2014b) for sampling details.

65

3.4. DISCUSSION

3.4.1. Phylogeographic Concordance Factors allow Congruence among taxa to be

Quantified

Phylogeographic concordance factors allow the degree of phylogeographic congruence to be quantified across multiple species, enabling researchers to identify species that share a pattern of co-diversification. To date, comparative phylogeographic studies have commonly compared phylogenies across multiple taxa in a qualitative manner, and interpreted similarity as a shared response to a historic event (Avise 2000).

However, to paraphrase Tolstoy, phylogenies that are not identical are each incongruent in their own unique way. So long as researchers rely on qualitative interpretations of congruence, comparative phylogeography will be prone to over-interpretation (Knowles and Maddison 2002).

PCFs provide two important contributions to comparative phylogeography. First,

PCFs allow researchers to quantify the amount of congruence among the phylogeographic histories of co-distributed species. This method can be applied to co- distributed species within certain taxonomic groups (e.g., birds, reptiles, mammals), as has been historically done in comparative phylogeographic research, and can help identify earth history processes that have promoted diversification and structured genetic and taxonomic variation in similar ways. When applied to an ecological community, however, they allow for an assessment of the correlation between ecological association and evolutionary history. Although multiple macroevolutionary processes can influence phylogenetic congruence (e.g., dispersal, selection, speciation and extinction), a null

66 prediction is that ecological dependence is correlated with phylogenetic congruence

(Clayton et al. 2004). While this has been best demonstrated in host / parasite (Hafner et al. 2003) and plant / arthropod interactions (Farrell and Mitter 1990; McKenna et al.

2009), ecological communities that consist of many mutualist species represent potentially informative systems for phylogeographic investigation. In the S. alata system, arthropod species that have a tighter ecological association with the pitcher plant exhibit higher average PCF values, a pattern predicted based on ecology (Fig. 7). Second, PCFs allow the researcher to potentially identify taxa that do not fit the overall pattern of community diversification. If evolutionary processes have strongly influenced diversification in a region, or the community contains tight ecological associations, it may be reasonable to expect that taxa share a phylogeographic history. By comparing different models of community composition, PCFs allow for the identification of groups of organisms that have responded in similar ways to historical events, and in the case of those species that share ecological association, to identify communities with taxa that have sustained their ecological relationships through evolutionary time. For an example of the latter, Table 5 shows the average phylogeographic concordance factors across 26 models, varying in taxonomic composition. All models that exclude P. viridans contain

PCF values within the 90% HPD of expected values (see Appendix B.8), but models that include P. viridans are not contained within the range of values expected if the species share a phylogeographic history (based on our simulations at the K = 2 level). This compelling result corroborates the idea that the spider is an opportunistic predator taking

67 advantage of this specialized habitat when present, but does not rely on this habitat for survival and reproduction.

Phylogeographic concordance factors provide a flexible approach for comparative phylogeography because they are calculated directly from species tree distributions. PCFs can be calculated from single or multilocus data sets (as demonstrated here), based on either full sequence data or single nucleotide polymorphisms (SNPs). While we analyzed single locus data for the arthropod species, multilocus data are preferable for PCF analysis, as data gathered from multiple independent nuclear markers will improve the estimate of phylogenetic history (Rokas and Carroll 2005). Single locus data have recognized shortcomings for phylogeography, particularly for parameter estimation

(Edwards and Beerli 2000; Felsenstein 2006), but the COI gene (used here) represents an informative marker (e.g., Herbert et al. 2003). Multilocus data (eight loci) were analyzed for the host plant, and the PCF framework allows for the quantitative integration of analyses using both types of data. The use of the species tree framework also greatly reduces the impact of phylogenetic error that results from the stochastic process of allele coalescence, a necessary requirement as most phylogeographic research is focused in recent time scales (e.g., Pleistocene Epoch). The Bayesian inference framework accounts for phylogenetic uncertainty; by utilizing a posterior distribution of trees, error inherent to point estimates is thus avoided. The reliance on posterior distributions of trees, however, should be taken into account when interpreting the absolute values of the PCFs.

An average PCF value of 1.0 (i.e., perfect phylogeographic concordance) can only be achieved if all trees in the posterior distributions across all species are topologically

68 identical. As this is highly unlikely to ever be the case, species are unlikely to reflect average PCF values of 1.0, even if they have co-diversified. For example, our results when testing simulated data sets that came from the same tree highlight this issue, as an average PCF value of 1.0 was never recovered (max = 0.9973). The median PCF average value under these conditions was 0.86, reflecting high phylogeographic concordance, with values down to 0.71 still found within the 90% HPD (even though they were simulated from the same tree). Depending upon the process of diversification and speciation in the taxa, expectations may vary and a range of values could be indicative of a pattern of co-diversification. As the simulation conditions used here matched the empirical data, increasing the number of loci will improve the phylogenetic estimate of the species, and will likely increase the average PCF values for co-distributed taxa if they do share a pattern of co-diversification.

To demonstrate the efficacy and flexibility of phylogeographic concordance factors, we applied the method to two additional systems. The three data sets span the ecological continuum, from species within the same taxonomic group (Neotropical birds), to species from disparate taxonomic groups with some sharing habitat requirements

(PNW), to species from disparate taxonomic groups with some tightly linked through ecology (S. alata). The pitcher plant system may be thought of as a best case scenario, where strong ecological interactions have manifested themselves into shared phylogeographic patterns; however, high PCF values are also seen in the other two systems. As these additional data sets contrast the pitcher plant system (in their ecological interactions), they allow us to tease apart how evolutionary and ecological processes

69 contribute to species interactions over historical time. For example, habitat requirement explains why the three amphibians (in the PNW) share nearly identical phylogeographic histories, as there is no evidence that they are ecologically dependent on one another. In total, these analyses demonstrate that PCFs are broadly applicable to a diverse range of phylogeographic data. Because they are broadly applicable, they provide an important tool for synthesis between comparative phylogeography and ecology.

3.4.2. The Mississippi River is A Driver of Diversification

The Mississippi River is a well-characterized biogeographic barrier (Soltis et al.

2006), and has had a strong effect on the S. alata system, restricting gene flow across this large body of water for many taxa. Strong population genetic structure is seen among most arthropod species (GST values and AMOVA results; see Tables 3 and 4), suggesting that the landscape processes that promote diversification within S. alata have a similar influence on the associated arthropods. The PCF analysis provides strong evidence in support of this finding. Shared phylogeographic patterns are seen with the fly

(S. sarraceniae), the mite, and the moth, as compared to the host plant (Fig. 7), as well as when estimating a community phylogeny (Fig. 8A). While the reciprocal monophyly observed in the gene trees for four of the six arthropods further supports the isolation of populations on either side of this biogeographic barrier, the PCF analysis indicates that all of the major rivers that bisect the range of S. alata play a similar role in inhibiting gene flow. In addition, the PCF analysis complements other comparative phylogeographic approaches, such as hABC tests of simultaneous divergence.

70

We used PyMsBayes to test for simultaneous divergence across the Mississippi

River. Unlike PCFs, which makes inference based on topology only, PyMsBayes estimates divergence times for all taxa and estimates the posterior probability of the number of divergence episodes for a given system. Implementation of ABC analyses is challenging, as multiple steps in the process can lead to incorrect inference, including the selection of summary statistics (Robert et al. 2011) and models (Pelletier and Carstens

2014) to include in the analysis. Results from analysis of the S. alata system with

PyMsBayes were inconclusive, as the posterior distribution aligned closely to the prior distribution. Although the model was informed by species-specific information from the literature (see Appendix B.16), it is likely that patterns in the mtDNA are driving these results, specifically that four of the six arthropod species are reciprocally monophyletic across the Mississippi River in their ML tree estimates (Appendix B.2–B.7); S. alata is also monophyletic in the eastern region of the chloroplast ML tree (Appendix B.9).

Estimating population divergence times is exceedingly difficult under this scenario, as different combinations of ancestral population size and gene divergence time can lead to the same population splitting event. The confounding of these two parameters (with single locus data) leads to a large variance in divergence time estimation, necessarily making this approach uninformative in our study, regardless of how we designed the analysis (Table 6; Appendix B.14–B.15). It appears that the data collected here, when summarized following Hickerson et al. (2006), contain little information as to the number of divergence events in this community when estimating temporal synchronicity of population splitting times in a parameter-rich model.

71

PCFs are strictly a spatial test and do not explicitly account for the timing of divergence. In combination with the results of the hABC analysis (when taken at face value), the taxa that share the same phylogeographic break in the pitcher plant system may have diversified at different times, although a lack of signal in the data highlight the difficulty in robustly estimating times of divergence (Arbogast et al. 2002). Since the biggest shortcoming of the single locus data is their inability to estimate population divergence times (Edwards and Beerli 2000), multilocus data are likely to improve our ability to test for co-diversification, where shared spatial patterns can be further assessed for temporally concordant divergences. A full analysis can be seen with the Neotropical birds, where inferences based on topology (with PCFs) complement, but somewhat vary, with inferences based on divergence times (see Smith et al. 2014b). PCFs provide a quantitative tool for the comparison of topological patterns; we envision the addition of this novel method to the phylogeographer’s toolbox, where analysis of divergence times and population sizes (in addition to PCFs) can provide a full understanding of diversification dynamics in co-distributed species.

3.4.3. Insights into the Sarracenia alata Pitcher Plant System

Population structure has been demonstrated in S. alata at both shallow and deep levels in the population tree (Koopman and Carstens 2010; Zellmer et al. 2012; Carstens and Satler 2013). Given the arthropod community that interacts ecologically with the plant (Folkerts 1999), ecological association predicts phylogeographic concordance among the community members. Congruent population structure is recovered across the

72 fly, moth, and mite, which suggests that the evolutionary processes shaping genetic variation in the Gulf Coast have had a similar effect on those species as on the host plant.

All three of these arthropods have high affinity for the pitcher plant habitat:

E. semicrocea spends its entire life cycle within the plant, only moving around at night for short-range dispersal and mating, and both the moth and S. sarraceniae (Dahlem and

Naczi 2006) depend upon the pitcher for larval deposition, with their larvae feeding and developing within the pitcher tubes. The mite largely preys upon nematodes and smaller mites within S. alata's pitcher, and is seldom collected outside of this environment

(Muma and Denmark 1967). All of these species have a high degree of phylogeographic concordance (Fig. 8). In contrast, there is little evidence that the spiders are congruent with S. alata. Both spiders are opportunistic predators and widespread species that are not ecologically dependent on S. alata. Both also exhibit (Foelix 1982), a mechanism for long-range dispersal, and this behavior, in combination with their lack of ecological dependency with S. alata, likely explain the lack of concordance demonstrated here.

3.5. CONCLUSIONS

The Sarracenia alata pitcher plant system exhibits, at the broadest scale, shared phylogeographic patterns of multiple arthropods suggestive of co-diversification.

Ecological relationships within the system predict the level of phylogeographic congruence among the species, demonstrating that the flies, moth, and mite have a shared phylogeographic history with the host plant. The spiders, which are not ecologically

73 dependent on S. alata and are capable dispersers, show no evidence of co-diversifiication.

To better understand the history of this community, we introduced phylogeographic concordance factors. PCFs quantify phylogeographic congruence among species, whether in pairwise comparisons between individual species and the host plant (Fig. 7), among all species (Fig. 8A), or permutations of all combinations of species at varying K levels

(Table 5). This novel approach can utilize the posterior distribution from any Bayesian species tree method, and is applicable to differing amounts and types of phylogeographic data, including full sequences from any number of loci as well as SNPs (through the use of a program like SNAPP; Bryant et al. 2012). This inherent flexibility enables PCFs to be applied to data collected from thousands of previously published phylogeographic studies, making a valuable contribution to our understanding of how evolutionary processes have shaped and structured taxonomic and genetic variation within a region.

3.6. PUBLICATION INFORMATION

This chapter previously published as: Satler JD, Carstens BC (2016) Phylogeographic concordance factors quantify phylogeographic congruence among co-distributed species in the Sarracenia alata pitcher plant system. Evolution, 70, 1105–1119. (doi:

10.1111/evo.12924).

74

Chapter 4: Do ecological communities disperse across biogeographic barriers as a

unit?

4.1. INTRODUCTION

Comparative phylogeographic investigations can elucidate the historical processes that shape and structure biological diversity. A common approach is to infer population genetic structure and estimate other parameters in a geographic context that devotes particular attention to biogeographic barriers. Similarities in the demographic histories across species are indicative of a shared response to landscape changes (Avise et al.

1987; Sullivan et al. 2000), while idiosyncratic patterns suggest an independent response.

Although this framework for comparative phylogeography is enticing due to its simplicity, the details of how to compare demographic histories across species are key.

Initial approaches for this comparison utilized gene trees, with similarity in pattern being suggestive of a common response to a historical event (Avise 2000; Arbogast and

Kenagy 2001). While shared spatial patterns can indicate a similar response, temporal information is necessary to demonstrate a shared response in time (Edwards and Beerli

2000). Subsequent researchers have generally taken one of two approaches, either by estimating divergence times independently from each species for comparison (e.g.,

Carstens et al. 2005; Smith et al. 2012) or by using probabilistic models to estimate the

75 number of divergence episodes (e.g., Hickerson et al. 2006, 2007). In addition to methodological considerations such as these, the nature of the biogeographic barrier itself is an important consideration.

Hard biogeographic barriers contribute to the diversification of biota by providing physical barriers to gene flow by blocking the movement of individuals (Pyron and

Burbrink 2010). Such barriers provide an opportunity to understand how communities responded to a historical event, particularly in cases of vicariance. One compelling example is the formation of the Isthmus of Panama, which is estimated around 3 Ma

(Coates et al. 2005; but see Bacon et al. 2015 for an alternative interpretation). Although the formation of the isthmus led to a migration corridor for terrestrial organisms (i.e., the

Great Biotic Interchange; Marshall et al. 1982), the land bridge represents a hard barrier for marine organisms that drastically altered marine environments on the Pacific and

Caribbean sides (Leigh et al. 2014). Phylogeographic studies of marine organisms demonstrate that most germinate species diverged prior to the formation of the barrier

(Cowman and Bellwood 2013), but did so asynchronously, illustrating that intrinsic differences across species played a role in their response to this geological process

(Knowlton and Weigt 1998; Lessios 2008). Another example of a hard barrier is in Baja

California, where a phylogeographic break is recovered in many taxa splitting the northern and the southern biotas of the peninsula (vicinity of the Vizcaino Desert 28 –

30° N latitude; e.g., Riddle et al. 2000; Garrick et al. 2009). Upton and Murphy (1997) proposed the presence of a mid-peninsular seaway 1 Ma to explain genetic patterns recovered in the side-blotched lizards (genus Uta). Although geological evidence is

76 lacking (Hafner and Riddle 2005) and support is tenuous for the presence of this seaway

(Crews and Hedin 2006; Leaché et al. 2007), patterns in the phylogeographic histories of many taxa support a vicariant event in this region with the most parsimonious explanation of an ancient seaway (Lindell et al. 2006). Like the rise of the Isthmus of

Panama, the formation of this putative seaway likely occurred after the ancestor of these species occupied Baja California.

Other biogeographic barriers are more porous. For example, Alfred Russel

Wallace observed discontinuities in the distributions of various monkey species in the

Amazon basin, and proposed the riverine barrier hypothesis, where rivers act as barriers promoting genetic and taxonomic divergence (Wallace 1852). Although rivers are physical barriers for some taxa, illustrating the importance of intrinsic species characteristics, they are dynamic entities capable of drastic change in both flow rate and direction. Rivers are expected to be stronger agents of isolation as distance from the headwaters increases, as the wider the river the more challenging it is for many terrestrial taxa to disperse across this aquatic environment. Consequently, support for riverine barriers has been mixed, with some investigations supporting their isolating abilities (e.g.,

Burbink et al. 2000; Jackson and Austin 2010) and others demonstrating their permeability (e.g., da Silva and Patton 1998; Funk et al. 2007). Given the dynamic nature and potential permeability of rivers, dispersal and gene flow may be important factors influencing population genetic structure of species spanning these aquatic barriers.

The landscape of the southeastern United States is dominated by major rivers, including the Mississippi River, the largest river system in North America (Coleman

77

1988). The Mississippi River has its origins in the Mesozoic era, with a stream present since the Jurassic (Mann and Thomas 1968). While this ancient river has been identified as a biogeographic barrier and implicated in isolating populations on either side in a variety of taxa (reviewed in Soltis et al. 2006), it is necessarily a porous barrier, as numerous species younger than the formation of the river are distributed on each side.

This permeability could be influenced by intrinsic factors, such as species-specific dispersal abilities, or extrinsic factors, such as physical changes to the river. In particular, sea level fluctuations strongly influenced the Mississippi River during the Pleistocene

(Coleman 1988), and dramatic changes in the course of the river channel were common during the Quaternary (Mann and Thomas 1968). In addition, oxbow lake formation can transfer parts of land from one side of the river to the other, and has been invoked to explain species’ distributions that span rivers (Gascon et al. 2000). In contrast to hard biogeographic barriers, the age and permeability of the Mississippi River necessitates that east–west divergence in terrestrial taxa would be due to dispersal and colonization, not vicariance (Pyron and Burbrink 2010). Investigating these processes in an ecological community of interacting organisms should provide insight into how rivers act as biogeographic barriers, particularly when the nature of species interactions is known.

Here we investigate the riverine barrier hypothesis using the Sarracenia alata pitcher plant community. The carnivorous pitcher plant S. alata is restricted to bogs and fens in longleaf pine savannahs along the Gulf Coast from eastern Texas to western

Alabama, with a distribution bisected by the Mississippi River and Atchafalaya basin.

Leaves of the plant are pitcher-shaped, an adaptation for the capture and digestion of prey

78

(e.g., Darwin 1875; Ellison and Gotelli 2001). In addition, these leaves harbor non-prey species (inquilines) that interact ecologically with the plant (reviewed in Adlassnig et al.

2011). In particular, a distinct arthropod community is associated with Sarracenia

(Folkerts 1999). Some of these species rely upon the pitcher plant for all of their life cycle (e.g., flesh flies, moths), while others are opportunistic predators that intercept prey from the plant (e.g., spiders) but are not restricted to this unique habitat. If rivers such as the Mississippi act as porous biogeographic barriers, we predict that the demographic history of the arthropods in this community will match that of the plant in a manner that reflects these ecological relationships (Satler and Carstens 2016).

Genetic investigations into S. alata have demonstrated that genetic variation in the plant is structured largely due to the influence of multiple rivers that divide its range into several regions. Results from chloroplast DNA, microsatellites, and SNP data (Koopman and Carstens 2010; Zellmer et al. 2012) indicate that population genetic structure is largely promoted by major rivers, and estimates of the pattern of diversification demonstrate that the deepest split in the population tree corresponds with the Mississippi

River and dates to the middle Pleistocene. Results from species delimitation analyses suggest that populations on either side of this river constitute independent evolutionary lineages (Carstens and Satler 2013). Furthermore, investigations into the microbiome have demonstrated that some of the bacteria and other microorganisms exhibit genetic structure that mirrors that of the host plant (Koopman and Carstens 2011; Satler et al.

2016). Taken as a whole, there are compelling reasons to predict that the S. alata ecological community has dispersed in a concerted fashion across the Mississippi River.

79

Specifically, we predict estimates of divergence time across this barrier will be similar to those from S. alata for species that are ecologically dependent upon the plant, and should also exhibit broad similarities in estimates of population genetic parameters (e.g., Ne and

2Nm), whereas those species not dependent on the plant will exhibit idiosyncratic patterns in many population genetic parameters.

4.2. MATERIAL AND METHODS

4.2.1. Taxon Sampling

Five arthropod species were collected from 12 possible localities throughout S. alata’s distribution (Fig. 6; see Appendix C.4 for sampling information). Arthropods included two flesh flies (Fletcherymia celarata, Sarcophaga sarraceniae), a moth (Exyra semicrocea), and two spiders (Misumenoides formosipes, Peucetia viridans). The flesh flies and moth are considered obligate commensals, whereas the spiders are opportunistic capture interrupters (Folkerts 1999). All individuals were captured in the vicinity of the plant, usually resting on or just under the lid of the pitcher. For the flesh flies, males were pinned in the field with their genitalia extracted to confirm proper identification. Of those specimens, three legs were removed and placed in 95% EtOH for DNA preservation. All other specimens were preserved directly in 95% EtOH for DNA preservation.

4.2.2. DNA Sampling

Genomic DNA was extracted from the specimens (either leg tissue or full body soakings) using a Qiagen DNeasy kit. Between 24 and 26 individuals were selected per

80 species for sequencing, with samples that span the distribution of the species and roughly equal numbers on either side of the Mississippi River (see Appendix C.4). A double digest restriction-site associated DNA sequencing (ddRADseq) protocol (modified from

Peterson et al. 2012) was used to generate genomic sequences. Specifically, genomes were digested with two restriction enzymes (SbfI and MspI) to reduce the number (but generate higher coverage) of the potential suite of homologous loci. Following restriction enzyme digest and ligation of internal barcodes, libraries were amplified through polymerase chain reaction. After confirming amplification of the sequence libraries via gel electrophoresis, size selection was conducted with a Blue Pippin (Sage Sciences) targeting fragments between 300 and 600 base pairs. Samples were quantified using a bioanalyzer and qPCR to confirm quality library prep, and sequenced using an Illumina

HiSeq with single end 100 base pair reads.

Raw sequence reads were analyzed with pyRAD v3.0.0 (Eaton 2014) using parameter settings that were consistent for all five species. PyRAD is an automated pipeline that takes as input raw sequence reads and outputs loci, alleles, and SNPs. Base calls with a Phred score below 20 were replaced with Ns; up to four Ns were allowed for a read to be retained. A clustering threshold of 90% was used to assemble reads into loci.

RAD sequencing is prone to missing data due to mutations in the restriction enzyme sites as well as allelic dropout, and this missing data can bias parameter estimates in downstream analyses (Arnold et al. 2013). However, because retaining only those loci with 100% coverage can also bias parameter estimates because such loci are likely to be evolving more slowly than genome-wide averages, we allowed for some missing data in

81 our analysis. For all species except M. formosipes, we retained loci with a minimum of

75% coverage across all individuals for all species. A 50% threshold was used due to higher amounts of missing data from M. formosipes, which we attribute to difficulties in extracting high quality DNA from this species.

4.2.3. Population Genetic Structure

To infer population genetic structure within the species, we used STRUCTURE v2.3.4 (Pritchard et al. 2000), which assigns individuals into clusters by maximizing linkage equilibrium within clusters and minimizing linkage disequilibrium between clusters. Our analyses were conducted at the K = 2 clustering level, reflecting our understanding that populations of S. alata east and west of the Mississippi River comprise two distinct lineages (Zellmer et al. 2012; Carstens and Satler 2013). Our prediction, particularly for obligate commensal species, is that genetic structure at this level will also reflect this east–west division. For each species, we converted allelic data into haplotypes at each locus, utilizing the information contained in linked SNPs when more than one SNP was present within a locus. If any allele contained one or more Ns, we adopted the conservative approach of treating this sequence as missing due to ambiguity in allelic assignment. Analyses were conducted using an admixture model with correlated allele frequencies, sampling location information for each species, a burn-in of

1 X 105 generations and subsequent sampling for 5 X 105 generations. Each analysis was repeated 10 times, and results were processed and summarized with the pophelper package (Francis 2016) in R (R Core Team 2015).

82

An Analysis of Molecular Variance (AMOVA; Excoffier et al. 1992) was conducted on each species to assess the level of genetic partitioning across the landscape.

Specifically, we tested for genetic partitioning (i) within each locality, (ii) between localities on either side of the Mississippi River, and (iii) within each side of the river.

STRUCTURE haplotype files were converted to Arlequin files using PGDspider v2.0.7.1

(Lischer and Excoffier 2012). AMOVA analyses were conducted in Arlequin v3.5.1.2

(Excoffier et al. 2005), with distance matrices calculated using the number of different alleles per locus and 10,000 permutations to assess significance.

4.2.4. Estimating Population Size, Population Divergence, and Gene Flow

In S. alata, populations on either side of the Mississippi River are highly differentiated, and this population genetic structure dates to the middle Pleistocene

(>120,000 years ago; Zellmer et al. 2012). Species tree patterns suggest that multiple arthropods are phylogeographically concordant with S. alata (Satler and Carstens 2016).

Since we are interested in understanding temporal divergence in this investigation, we estimated population divergence (τ), population size (Ne), and gene flow (2Nm) to investigate whether the S. alata ecological community diversified in a concerted manner.

Methods that utilize the allele frequency spectrum (AFS) to analyze SNP data have recently been introduced (Gutenkunst et al. 2009; Excoffier et al. 2013). These model-based methods summarize biallelic SNP data using an allele frequency spectrum to infer population genetic parameters. One recently developed method, fastsimcoal2

(FSC2; Excoffier et al. 2013), implements coalescent simulations to calculate the

83 likelihood of the observed AFS given a demographic model using the likelihood calculation developed by Nielsen (2000). Simulations suggest that FSC2 is computationally efficient and produces accurate parameter estimates (Excoffier et al.

2013). As models are user-specified and not restricted to a certain number of populations, the flexibility of FSC2 makes it appealing to apply to the analysis of data from non- model species where the correct model is unknown (Thomé and Carstens 2016).

Model selection has become an integral part of phylogeography (e.g., Fagundes et al. 2007; Carstens et al. 2013). In order for parameter estimates to accurately capture the key biological processes of a given system, an assessment of the statistical fit of models representing a variety of parameters is warranted. Because populations of S. alata have been isolated on either side of the Mississippi River for a considerable amount of time

(Zellmer et al. 2012; Carstens and Satler 2013), we assumed a two population model, grouping samples on either side of the biogeographic barrier into populations. We then generated a suite of models, containing different combinations of parameters (e.g., τ, Ne,

2Nm), to identify models that offer a good fit to the data. FSC2 calculates a composite likelihood with the assumption SNPs are in linkage equilibrium, and thus any linkage may bias this calculation and invalidate model comparisons. To satisfy this assumption, we randomly selected one SNP per locus to generate an unlinked AFS. We then conducted model-selection on seven isolation-with-migration (IM) models (Fig. 10) using

Akaike information criterion (AIC; Akaike 1974), and calculated model probabilities using information theory following Burnham and Anderson (2002) for each species.

Parameter estimates were subsequently generated via model-averaging (i.e., weighted by

84 the probabilities for each of the models), allowing for estimates of a particular parameter to contribute to the overall parameter estimate in proportion to its model probability.

Although using unlinked SNPs reduced the number of entries in the AFS, it allowed us to account for model uncertainty when estimating the parameters of interest. Since we were concerned that reducing our data set to only unlinked SNPs might leave us with too few

SNPs to accurately estimate parameters of interest, we also estimated parameters using the traditional IM model (Fig. 10C) for each species incorporating all of the SNPs.

Linkage among SNPs affect the calculation of the likelihood, not parameter estimation, so linked SNPs are not expected to bias parameter estimation when a single model is used.

85

A) Model 1 - ISO B) Model 2 - ISOg C) Model 3 - IM

NeANC NeANC NeANC

o o o

NeW NeE NeW NeE NeW NeE

D) Model 4 - IMg E) Model 5 - IMWE F) Model 6 - IMEW

NeANC NeANC NeANC

o o o

NeW NeE NeW NeE NeW NeE

G) Model 7 - Island

NeW NeE

Figure 10. Models used in FSC2 analyses, all variations of the isolation-with-migration model (panel C). Models varied in their included parameters, from divergence, to migration, to population size change. Models are as follows: A) Isolation only, B) Isolation with population size change in daughter populations, C) IM model with symmetric migration, D) IM model with symmetric migration and population size change in daughter populations, E) IM model with migration from west to east, F) IM model with migration from east to west, and G) Island model. In models allowing population size change (B, D), daughter populations were allowed to contract or expand, with the process of population size change occurring independently in the two populations. In models allowing gene flow in both directions (C, D, G), migration rates were independent.

86

Analyses were conducted with fastsimcoal v25221 (Excoffier et al. 2013). We built a folded allele frequency spectrum from the counts of minor allele as we did not have sequence data from outgroups. Fixed numbers of alleles for all populations are required for generating the observed AFS, however, only including SNPs with 100% coverage would drastically reduce (and likely bias) our sampling. To account for missing data, yet maximize the number of SNPs, we required that 75% of alleles were present within each population (east and west) for the SNP to contribute to the AFS (50% for

M. formosipes). Given these criteria, building of the observed AFS took place in three ways: (i) if either population had fewer alleles than the set threshold, that SNP was discarded, (ii) if either population had the same number of alleles as the threshold, the allele frequencies were calculated (for the total SNP) and the minor allele count was used in the AFS, (iii) if either population had a greater number of alleles than the threshold, the alleles were subsampled with replacement until the necessary number of alleles

(matching the threshold) were sampled, and then the minor allele was counted. For the

SNP that met either criterion ii or iii, the proper cell was populated in the AFS with the minor allele counts from each population. Although this down sampling procedure allowed us to include more SNPs in our analysis, it had the undesirable effect of subsampling some alleles to appear monomorphic. As we subsampled alleles at many of the SNPs, the AFS will necessarily vary across replicates. To account for variation in generating the observed AFS, we replicated the AFS building procedure 10 times.

Replication serves two purposes: (i) it accounts for variation in the subsampling process,

87 and (ii) allows us to generate confidence intervals on parameter estimates for across species comparisons.

To convert parameter estimates to real values, we used a mutation rate of 8.4 X

10-9 estimated from Drosophila flies (Haag-Liautard et al. 2007). Species-specific generation length estimates were gathered from the literature to scale parameters to real values. Specifically, we used two generations per year for the moth and flies, and one generation per year in the spiders; we discuss later implications of uncertainty in these estimates. We also counted the number of invariant sites in the sequence data to populate the monomorphic cell. All FSC2 analyses were run on the Oakley cluster at the Ohio

Supercomputer Center (https://osc.edu). Each analysis (for each AFS replicate per model) was repeated 50 times, to take into account stochasticity in the simulated AFSs (as recommended by Excoffier et al. 2013). The run with the highest composite likelihood was then selected as the best run (among the 50), and parameter estimates from these runs were recorded. Custom python and bash scripts were written to generate the observed

AFS, prepare each analysis, and collate and summarize the results.

4.3. RESULTS

4.3.1. DNA Sampling

We sequenced either 24 (F. celarata, M. formosipes, S. sarraceniae) or 26

(E. semicrocea, P. viridans) individuals from each species using two HiSeq lanes and a partial MiSeq lane resulting in ~310 million sequence reads. Following demultiplexing and quality control, we retained ~215 million reads for de novo assembly (Table 7).

88

Between 1700 and 5300 clusters (on average) were generated per species, using a 90% within-species clustering threshold and requiring at least six reads before calling a cluster. Our final data sets—requiring at least 75% of individuals (50% for M. formosipes)—contained between 447 and 1799 loci, and between 655 and 7215 variable sites for analysis.

Table 7. Genomic sequencing data. Samples were processed through pyRAD. Loci present in at least 75% of samples for all species (50% in M. formosipes). 1Reads that passed quality filters. 2Clusters with at least six reads; median and range are reported.

Species Samples (N) Reads1 Clusters2 at 90% Loci Variable Sites E. semicrocea 26 81101653 5036 (1808 - 12668) 923 2933 S. sarraceniae 24 43712851 2338 (1330 - 4964) 448 1433 F. celarata 24 30563679 1760 (615 - 3999) 447 655 M. formosipes 24 25462360 5313 (434 - 14968) 1799 7215 P. viridans 26 34253839 4948 (809 - 7903) 921 1864

4.3.2. Population Genetic Structure

STRUCTURE results vary by species, but are consistent based on ecology

(Fig. 11). The moth (E. semicrocea) is partitioned into two groups on either side of the

Mississippi River, with a similar pattern recovered in one of the flies (S. sarraceniae).

Population structure in the other fly (F. celarata) is minimal, as essentially no structure is seen at the K = 2 level. This result, however, appears to be an artifact of the uneven sampling on either side of the Mississippi River, as only five flies were sampled from west of the Mississippi River (Appendix C.4; see Puechmaille 2016 for discussion of how

89 such uneven sampling can bias STRUCTURE results). When we randomly subsampled individuals in the eastern locales to be similar in number to the sample sizes in the west, genetic partitions were geographically clustered, recovering the east–west split

(subsampling replicated 10 times, with STRUCTURE analyses run as outlined above;

Appendix C.1). In contrast to the insects, both spiders exhibit no appreciable genetic structure, with STRUCTURE plots discordant with geography.

Results from the AMOVA are consistent with those from STRUCTURE. In both

E. semicrocea and S. sarraceniae, there is significant genetic structure at multiple levels of the analysis (two of three for the moth, all three for the fly), demonstrating strong population genetic structure in each species (Table 8). Population structure in the other fly (F. celarata) suggests significant association among localities but not at the other levels. In the spiders, genetic data are not significantly structured at any of the hierarchical levels, consistent with our inference of a loose association between the species and the pitcher plant.

90

E. semicrocea BL SD PT CB RD AS LR TL DS FC TB West East S. sarraceniae BL PT CB KS LR TL DS TB West East F. celarata F.

CB TL DS TB West East M. formosipes BL PT CB RD KS AS DS West East P. viridans P.

BL PT CB RD KS AS LR TL DS FC TB West East

Figure 11. STRUCTURE results showing clustering of individuals at the K = 2 level for each species. See Figure 6 for sampling locality information.

91

Table 8. AMOVA results. Samples were partitioned within either side of the Mississippi River. Percent of variation partitioned at each level is presented.

Among Among locales Within Species locales φST p-value within regions φSC p-value regions φCT p-value E. semicrocea 32.45 0.4337 0.0000 10.92 0.1616 0.1407 56.63 0.3245 0.0031 S. sarraceniae 16.03 0.2952 0.0000 13.49 0.1606 0.0028 70.48 0.1603 0.0287 F. celarata 11.75 0.1271 0.0470 0.96 0.0109 0.6212 87.29 0.1175 0.2467 M. formosipes -2.51 0.1253 0.1365 15.05 0.1468 0.1752 87.47 -0.0252 0.6205 P. viridans 1.17 0.1061 0.2076 9.44 0.0955 0.2633 89.39 0.0117 0.0726

92

92

4.3.3. Estimating Population Size, Population Divergence, and Gene Flow

Model selection

We specified seven models for analysis using the unlinked AFS, all variations of the traditional isolation-with-migration models (Fig. 10). Results were similar across species in the sense that isolation-only models had low model probabilities, and for each species, multiple models received appreciable support (Table 9).

Table 9. Model selection using AIC and information theory. Only models that include migration generate any substantial support. See Figure 10 for model details.

Species Model E. semicrocea S. sarraceniae F. celarata M. formosipes P. viridans 1 0.0132 0.0014 0 0 0 2 0 0 0 0.0001 0 3 0.6130 0.4700 0.4067 0.3065 0.4494 4 0 0 0 0.0001 0 5 0.3481 0.5138 0.2572 0.6311 0.2599 6 0.0237 0.0070 0.3340 0.0051 0.2907 7 0.0019 0.0078 0.0022 0.0572 0.00

Divergence times

Within species, parameter estimates were similar across data sets (i.e., linked versus unlinked SNPs). As most species had strong support for one of the IM models

(Table 9), parameter estimates were relatively consistent across models regardless of 93 whether they were generated via model-averaging (from unlinked AFS) or from the full

IM model (using the linked AFS). In general, unlinked AFS with model-averaged parameters contained slightly younger divergence times than linked AFS with the IM model, not surprising given the contribution of models that did not include gene flow. For the remainder of this paper, we consider parameter estimates generated from the model- averaging approach with unlinked data sets, but note that results from the other analyses are similar (i.e., Appendix C.2–C.3, C.5).

Divergence times were restricted to the Pleistocene in all species (Fig. 12), with the precision varying across taxa. For example, estimates of population divergence in the moth (E. semicrocea) varied among replicates (Fig. 12); assuming two generations per year, the moth has an average divergence time estimate of 178,505 years before present with a 95% CI of 86,101 – 270,909 (Table 10). Estimates from the flies varied less across replicates. Assuming two generations per year, divergence time in S. sarraceniae averaged 179,688 years before present (95% CI 104,517 – 253,676), while those in F. celarata averaged 78,284 years before present (95% CI 69,516 – 87,053; Table 4). For the spiders, assuming one generation per year, divergence time estimates were older than the rest of the community: M. formosipes ~700k years before present (95% CI 611,487 –

798,347); P. viridans ~186k years before present (95% CI 171,646 – 201,887).

Collectively, divergence time estimates span from ~30k years before present to ~700k years before present (Fig. 12).

94

800

700

600

500

400

ergence Time (Kya) 300 v Di

200

100

0

E. semicrocea S. sarraceniae F. celarata M. formosipes P. viridans Moth Flies Spiders

Figure 12. Divergence time estimates from FSC2. Results show estimates of divergence times in years across the Mississippi River for each of the ten replicated data sets. Mean and 95% CI are presented from model-averaging with the unlinked AFS for each species.

Population sizes

Population size estimates are reflective of the ecological association of the arthropod with the plant. As with the divergence time estimates, values are generally consistent within species regardless of whether estimates were model-averaged (with unlinked AFS; Fig. 13) or from an IM model (with linked AFS; Appendix C.3, C.5). For

95 the moth, population sizes in the east are roughly four times as large as those in the west

(537,789 vs. 142,269; Table 10). This same pattern is seen in both flesh flies, where population sizes in the east are roughly three to five times as large as those in the west

(S. sarraceniae: 781,252 – 147,740; F. celarata: 366,485 – 106,486; Table 10). This pattern, however, is not seen in the spiders, where roughly equal population sizes are seen on either side of the river (M. formosipes: 581,175 (E) – 585,465 (W); P. viridans:

395,020 (E) – 358,179 (W); Table 10).

96

1000000

900000

800000

700000

600000

500000 Ne

400000

300000

200000

100000

0

W E W E W E W E W E E. semicrocea S. sarraceniae F. celarata M. formosipes P. viridans

Moth Flies Spiders

Figure 13. Effective population size estimates from populations on either side of the Mississippi River from FSC2. Results are from the ten replicated data sets. Mean and 95% CI are presented from model-averaging with the unlinked AFS for each species.

Gene Flow

Migration rates (in units of 2Nm) are lowest among the ecologically-associated species (Table 10). In the moth, migration is below 0.5 in either direction, suggesting little to no migration within this species. Similar results are seen with the flies, although values for F. celarata are above one in either direction, with an estimate of 2.57 in the 97 east to west direction (Table 10). Migration rates are highest within the spiders. In

M. formosipes, 2Nmwest to east = 4.83; in P. viridans, 2Nmeast to west = 4.48, and 2Nmwest to east

= 4.22 (Table 10).

98

Table 10. Estimates of population genetic parameters from FSC2 from model-averaging and unlinked AFS data sets. Divergence times are in years, scaled by number of generations per year, and migration rates are in 2Nm. Values were averaged across the ten replicated data sets within each species.

Species tau Ne WEST Ne EAST MWE MEW Mean 95% CI Mean 95% CI Mean 95% CI Mean 95% CI Mean 95% CI E. semicrocea 178,505 86,101 – 270,909 142,269 118,036 – 166,503 537,789 482,742 – 592,835 0.27 0.16 – 0.38 0.43 0.05 – 0.82 S. sarraceniae 179,688 104517 – 253,676 147,740 99,786 – 195,695 781,252 681,562 – 880,942 1.14 0.74 – 1.55 0.59 0 – 1.34 F. celarata 78,284 69,516 – 87,053 106,486 51,535 – 161,436 366,485 293,778 – 439,192 1.12 0.70 – 1.54 2.57 1.38 – 3.76 M. formosipes 704,917 611,487 – 798,347 585,465 507,250 – 663,679 581,175 478,575 – 683,774 4.83 3.92 – 5.74 0.15 0 – 0.47 P. viridans 186,766 171,646 – 201,887 358,179 268,132 – 448,226 395,020 300,822 – 489,218 4.22 2.47 – 5.97 4.48 2.57 – 6.40 99

99

4.4. DISCUSSION

4.4.1. Diversification Patterns of the Sarracenia alata Ecological Community

Zellmer et al. (2012) estimated that S. alata dispersed across the Mississippi River at least 120,000 years before present, demonstrating a Pleistocene divergence in the pitcher plant across this biogeographic barrier. Given findings in other studies of host plants and associated arthropods sharing a phylogeographic history (e.g., Smith et al.

2011), we predict that obligate commensals of S. alata should exhibit divergence time estimates similar to or more recent than the plant, reflecting the requirement of this specialized habitat to facilitate colonization following dispersal to the west side of the river for the arthropods.

We sampled a diverse set of arthropods, ranging in their association with the host pitcher plant from obligate commensals (moth and flies) to opportunistic capture interrupters (spiders). Our results, from the population genetic structure and parameter estimates, suggest that two of the three commensal arthropods (pitcher plant moth

E. semicrocea, pitcher plant fly S. sarraceniae) dispersed across the Mississippi River during the middle Pleistocene, largely in concert with S. alata. Remarkably, divergence time estimates are within 1200 years of one another, and similar to those estimated in the pitcher plant. This suggests the association between these two arthropods and the plant has been stable for over 178k years. These results are consistent with what is known of the life history of these species. The moth spends its entire life cycle in the pitcher plant leaves, and has not been sampled outside of this habitat (Jones 1921; Stephens et al.

100

2011). The moths are dispersal-limited, and poor flyers that display limited nocturnal movement to neighboring pitchers in the same area (Folkerts 1999; Stephens and Folkerts

2012). The flesh flies sampled here are also specific to this habitat (Dahlem and Naczi

2006), with research in another pitcher plant fly (Fletcherimyia fletcheri, found in association with Sarracenia purpurea) suggesting these flesh flies are capable movers throughout their natal bog with limited abilities to move to neighboring bogs (Krawchuk and Taylor 2003; Rasic and Keyghobadi 2012).

The pattern identified in the other flesh fly (F. celarata) is intriguing, and possibly reflective of interspecific competition. Both flies are tightly associated with the plant; each uses the pitcher leaves for larval deposition and is not known to live outside of this specialized habitat (Dahlem and Naczi 2006). Population divergence estimates from F. celarata are much more recent (~78k years) than those from S. sarraceniae, suggesting that this species dispersed across the Mississippi River after western populations of the plant and S. sarraceniae were already well established. In our extensive field work, we were only able to collect five flies of F. celarata from the western locales (all from Cooter’s Bog); in contrast, we collected 73 S. sarraceniae individuals from the west. A series of specimens is known from Warren, Texas (see

Dahlem and Naczi 2006), but we were unable to locate any individuals of this species in any other western locale. These five samples are monophyletic in their mitochondrial

DNA (Satler and Carstens 2016), and population genetic parameters support their east-to- west dispersal and structure (following subsampling and replication in STRUCTURE;

Appendix C.1). Two potential factors could explain these results. One, the limited

101 sampling and population genetic parameter estimates are biological reality. Pitcher plant flesh flies are ovolarviparous, with females depositing a single larvae per pitcher. Larvae are aggressive and territorial, actively attacking other flesh fly larvae when present

(Rango 1999; Dahlem and Naczi 2006; Forsyth and Robertson 1975). As these two flesh fly species fill the same ecological niche, interspecific competition could explain the higher numbers of S. sarraceniae consistently found at pitcher plant locales in the west.

Given our estimated divergence times, S. sarraceniae would have had substantially more time to become established (than F. celarata) at plant populations west of the Mississippi

River, leading to their higher abundance in our sampling efforts. Alternatively, the younger divergence time recovered in F. celarata could be an artifact due to limited sampling. Although we were able to sample up to 10 alleles per locus for the western individuals, limited geographic sampling combined with lower numbers of allele counts may have precluded us from generating accurate estimates of divergence times. We note, however, that all other population genetic parameters support an east-to-west dispersal in this species, with population structure mirroring the pitcher plant highlighting the tight relationship between the flesh fly and host plant.

While divergence times vary among these three arthropod species, suggesting two trans-river dispersal events, a more compelling result that the effective population sizes of these three species are largely concordant on either side of the river, with a pattern mirroring that of the host plant in that they are much larger in the east than in the west. In the plant, effective population sizes are estimated to be ~6.5 times as large in the east as in the west (Zellmer et al. 2012); in these three arthropods, we recover population sizes

102 between three and five times as large in the east as in the west. These results are consistent with a growing biogeographic understanding of this system. Stephens et al.

(2015) proposed a center of origin for Sarracenia in the southeast, either at the southern

Appalachian Mountains or Apalachicola region, where each of the other Sarracenia species are distributed. In addition to being the only member of the genus found west of the Mississippi River, population genetic patterns in S. alata support this hypothesis, with colonization of the west from eastern populations (Koopman et al. 2010). Results from the insects are consistent with this scenario.

How did the S. alata community disperse across the Mississippi River?

Sarracenia seeds are tiny (Ellison 2001) and lack modifications for long-range dispersal.

Ellison and Parker (2002) recovered most seeds of Sarracenia purpurea within five cm of the parent plant, suggesting limited seed dispersal in these plants. We follow Zellmer et al. (2012) in suggesting that a likely scenario is that the course of the river changed to effectively move some habitat from the east side to the west via the process of oxbow lake formation (Gascon et al. 2000). The lower Mississippi River is a dynamic system, with tremendous change in movement and flow during the Pleistocene (Coleman 1988,

Mann and Thomas 1968). Such a process would provide the opportunity for mature plants and their commensal arthropods to move as a single unit across the river.

In contrast, the two spider species contain markedly different population genetic patterns in the STRUCTURE results (Fig. 11) and AMOVA results (Table 8). Rates of gene flow across the Mississippi River are high in each species (Table 10), and population sizes do not mirror that in the plant. Given their abilities for long-range

103 dispersal (via ballooning), data suggest the ongoing movement of these spiders across the

Mississippi River. Divergence time estimates in the crab spider (M. formosipes) are much older (~700k years before present) than the other species, while those in the green lynx spider (P. viridans) are similar with those of other species analyzed here. Both spiders are described as capture interrupters (Folkerts 1999) that exploit the insect-attracting abilities of Sarracenia, commonly found in association to the pitcher plant but not limited to this specialized habitat for survival and reproduction. Results suggest a complex history for both spiders, but ongoing gene flow and no discernable genetic structure support a lack of co-diversification with the pitcher plant.

4.4.2. Challenges with Comparing Divergence Times across a Biogeographic Barrier

Investigating the timing of diversification across biogeographic barriers is of central importance to comparative phylogeography, as a clustering of divergence times suggests a shared response to a historical event (Bermingham and Moritz 1998).

Accurately estimating divergence times, however, is challenging, particularly when the focal species are sampled from disparate taxonomic groups. Before the incorporation of the coalescent model (Kingman 1982) by the discipline, researchers would calculate genetic distances across a barrier and then calibrate those divergences with a mutation rate to generate absolute times for comparison. Often, these estimates varied widely, leading to the conclusion that a single vicariant event did not occur (Klicka and Zink

1997; Knowlton and Weigt 1998). Methods incorporating the coalescent model allow the timing of population divergence to be directly estimated, potentially leading to more

104 precise inferences of community divergence (Hickerson et al. 2006). Although coalescent-based approaches account for the stochasticity of allele coalescence, they can still be prone to wide confidence intervals around estimates of τ, particularly with limited genetic data (Arbogast et al. 2002). In this study, we use hundreds to thousands of SNPs to infer evolutionary history among co-occurring species, and recover a range of divergence times that would suggest a lack of a single divergence episode.

Phylogeography has assumed since Edwards and Beerli (2000) that more data would lead to more precise estimates of population divergence, and thus facilitate comparative studies across biogeographic barriers. Next generation sequencing has made the collection of large amounts of data feasible to phylogeography (McCormack et al.

2013). However, comparative investigations require two assumptions to convert parameter estimates into real values: mutation rate and generation length. Within the same taxonomic groups, these values are typically assumed to be the same across taxa

(e.g., Smith et al. 2014b; Papadopoulou and Knowles 2015). In studies such as ours, however, comparison of species that are distantly related to one another is complicated by a lack of information about these values. Here, we used a mutation rate estimate taken from Drosophila flies (Haag-liutard et al. 2007). Although this mutation rate might be relatively accurate for Drosophila, its relevance to the more distantly related dipterans, lepidopterans, and arachnids analyzed here is suspect because the three groups likely diverged before the Cambrian (e.g., Rehm et al. 2011). Perhaps a larger concern is generation length. In this study, we investigated small arthropods whose life history traits are relatively unknown. We are somewhat confident that araneomorph spiders have one

105 generation per year (Foelix 1982), but are less certain about the remaining arthropods.

The moth and the flies are suggested to have multiple generations per year, but exact values are unknown (Folkerts 1999). Moon et al. (2008) suggested Exyra has two generations per year, and this value is consistent with estimates from other moths in the

Noctuidae family (e.g., Spitzer et al. 1984). For the flies, we based our estimate of two generations per year on research conducted in another pitcher plant flesh fly

(Fletcherimyia fletcheri) that is associated with Sarracenia purpurae. F. fletcheri is estimated to have one generation per year at the higher latitudes in northeastern United

States and Canada (Rango 1999; Rasic and Keyghobadi 2012), where pitcher leaves are active for ~4–8 weeks (Fish and Hall 1978). In S. alata, leaves appear to be active for at least three months, and combined with the longer growing season provides a longer period for the flies to go through their life cycle. Therefore, we believe it is reasonable to use a value of two generations per year. This is consistent with generation time estimates in other flesh flies (in the genus Sarcophaga) suggest 2–3 generations per year in the temperate regions, with generation times taking up to 60 days (Denlinger 1978), although generation time can be shorter depending on day length and temperature (Chen et al.

1987; Lee et al. 1987). While data suggests two generations per year may be appropriate, there is clearly uncertainty in this assumption. Furthermore, seasonal and yearly fluctuations in climate and environment will influence the number of generations that these flies have per year. In the cooler Pleistocene Epoch, climatic factors may have led to a shorter breeding season and a longer time to reach sexual maturity, leading to fewer generations per year. To illustrate, Figure 14 shows how our divergence time estimates

106

(using S. sarraceniae as an example) and interpretation of the results would change depending on the number of generations assumed.

500 S. sarraceniae

400

300

200 ergence Time (Kya) v Di

100

0

1 2 3 4 5 6 Generations per year

Figure 14. Effect of generation length on divergence time estimates. These are estimated divergence time values (mean and 95% CI) for S. sarraceniae from model-averaging and unlinked AFS, scaled by number of generations per year. Between one and three generations per year would result in a divergence time similar to estimates in S. alata, suggesting co-diversification. This demonstrates that our inferences are dependent on the values assumed, and highlight the difficulties inherent to conducting comparative phylogeographic investigations using parameter estimates.

107

Ultimately, investigating temporal diversification in a set of co-distributed species requires precise estimates of important parameters (e.g., τ). Even with next-generation sequencing technologies providing genome-wide sequence data for non-model organisms, lack of reliable values for mutation rate and generation length may limit such studies to those of closely related species. This is unfortunate, as phylogeographic investigations of ecological communities could lead to the identification of evolutionary communities (Smith et al. 2011; Carstens et al. 2016), the evolution of ecosystems

(Carstens et al. 2005), and the evolution of landscapes (Bermingham and Moritz 1998).

Ultimately, such investigations hold the potential to connect comparative phylogeography to community ecology by providing information pertinent to how communities are assembled and maintained through time. Investigating ecosystems and communities in an evolutionary framework provides insight in the diversification processes structuring these systems, and testing the idea of Darwin’s tangled bank (Darwin 1859).

4.5. CONCLUSIONS

Our results suggest that S. alata and at least two of its commensal arthropods dispersed across the Mississippi River in a concerted manner, likely facilitated via oxbow lake formation (Koopman et al. 2010). Population genetic structure and effective population sizes in the three obligate species (moth and both flies) suggest an east to west dispersal, consistent with colonization as a single unit with S. alata. However, divergence time estimates are younger in one of the flies; this could reflect a more recent dispersal

108 event to the western plant populations, or result from difficulty in accurately estimating divergence times for this species (either due to limited sampling or incorrect assumptions of mutation rate or generation length). In contrast to the insects, both spiders exhibit phylogeographic patterns that differ from those of S. alata, supporting previous inferences that they are only loosely associated with this plant. Here we provide evidence that the members of the S. alata pitcher plant system dispersed across the Mississippi

River as a single unit, demonstrating that communities can evolve in an ecological entanglement.

1 09

Chapter 5: Conclusion

“It is interesting to contemplate an entangled bank, clothed

with many plants of many kinds, with birds singing on the

bushes, with various insects flitting about, and with worms

crawling through the damp earth, and to reflect that these

elaborately constructed forms, so different from each other,

and dependent on each other in so complex a manner, have

all been produced by the laws acting around us.”

(Darwin 1859)

Understanding the evolutionary history of ecological communities is of central importance to biological research, and provides insight into the processes that drive community assembly, community structure, and speciation. There are almost two million described species, with interactions among species characterizing all habitats on the planet. I am interested in understanding how and why certain species came to exist within a community, and if those relationships have been stable through time. I have explored these questions in the Sarracenia alata pitcher plant community, a unique system characterized by the disparate lineages that interact with the plant. Results have

110 elucidated the taxonomic diversity comprising this inquiline community, and have demonstrated shared phylogeographic patterns indicative of co-diversification.

A diverse set of taxa were recovered from the pitcher plant fluid, and is the first study to characterize the taxonomic lineages associated with S. alata (Chapter 2). In particular, I was able to sample multiple OTUs that span the Mississippi River, and population genetic analyses suggest that roughly half of the taxa are geographically structured similar to the plant. This suggests a shared evolutionary history among multiple inquilines and the host pitcher plant. Importantly, this approach utilizes amplicon resequencing to rapidly and efficiently generate a comparative data set for phylogeographic analysis. Amplicon resequencing could be leveraged by phylogeographers for rapidly expanding the number and breadth of taxonomic lineages in a comparative analysis.

Given the information available on the arthropods that interact with Sarracenia plants, I decided to investigate their ecological interactions in an evolutionary framework.

Comparative phylogeography has historically been a discipline making post-hoc interpretations of genealogical patterns in a qualitative manner. I found this unsatisfying and developed a new quantitative method for the field of comparative phylogeography,

Phylogeographic Concordance Factors (PCFs). As a discipline, evaluating comparative patterns among genealogies has been a central objective of the field (Avise 2000), yet has never been done in a quantitative manner. PCFs accomplish this, and take advantage of the species tree paradigm by extending a gene tree / species tree approach an additional level in the hierarchy, to the community tree. This method uses the program BUCKy

111

(Ané et al. 2007; Larget et al. 2010), and estimates a community tree from a set of species trees, with the tips of the trees represented as geographic areas. For Chapter 3, I used three empirical data sets to show the efficacy of the method, and analyzed simulated data to generate expectations when a community has co-diversified. In the S. alata system, the spider is identified as a transient member, with ecology being a good predictor of a shared phylogeographic history. These results suggest PCFs can be a useful contribution to the phylogeographer’s toolbox.

Results from Chapters 2 and 3 highlight the Mississippi River as a promoter of diversification; this is seen in the pitcher plant (Koopman et al. 2010; Zellmer et al. 2012;

Carstens and Satler 2013) and across numerous taxa (Soltis et al. 2006). I further explored the effects of this biogeographic barrier in a temporal framework in Chapter 4. I generated genome-wide SNP data for five arthropods, ranging in ecological association from obligate (moth, flies) to facultative (spiders). In particular, the moth and flies are only known from this habitat, and have an intimate relationship with the plant for all of their life cycle (Jones 1921; Dahlem and Naczi 2006; Stephens et al. 2012), whereas the spiders are widespread predators not restricted to this habitat (Folkerts 1999). Given the antiquity of the Mississippi River, and the younger age of the pitcher plant system, this community necessarily dispersed across the river during the Pleistocene. Remarkably, divergence times for the moth and one of the flesh flies are nearly identical (~180k years before present, separated by ~1200 years), with the other flesh fly estimated to have diverged more recently (~80k years before present). All three of these arthropods exhibit much higher population sizes in the east than west (between three and five times), a

112 compelling results reflecting patterns seen in S. alata (Zellmer et al. 2012). This suggests the community dispersed across the Mississippi River from the east, further supported by the biogeographic hypothesis that the center of origin of Sarracenia is at the southern

Appalachians (Stephens et al. 2015). Based on divergence times, the plant, moth, and one of the flies dispersed ~180k years before present, likely facilitated by meandering and oxbow lake formation, whereby large swaths of land were moved from one side of the river to the other (e.g., Gascon et al. 2000). It is estimated that the other fly dispersed to the west at a more recent time; this could be due to the same processes facilitating trans- barrier dispersal as before, or long distance dispersal via flight. These flies are capable movers within a bog, although long-range dispersal ability is unknown (Krawchuk and

Taylor 2003). In contrast, population genetic parameters of the spiders do not suggest evidence of co-diversification. Migration rates are highest for these two species, suggesting their intrinsic abilities for escaping the effects of the Mississippi River. One of the spiders (P. viridans), however, does show a similar timing of divergence to the moth and fly, so it cannot be ruled out that the processes driving diversification in those species influenced this species in a similar manner.

Charles Darwin’s tangled bank is perhaps the most famous passage from his seminal book outlying the theory of evolution by natural selection (Darwin 1859).

Ecological interactions characterize all species inhabiting the planet. These range on a continuum, from little to no association to tightly linked associations. While these interactions are thought to influence coevolutionary patterns, there are surprisingly few examples to date that display this in nature. My dissertation provides evidence for co-

113 diversification in an ecological community, where important associations have been stable for a long time. I have developed a novel method for understanding what species have co-diversified with the pitcher plant S. alata, and have identified the Mississippi

River as a major factor driving community structure and diversification in the system.

This work has clear conservation implications, as the loss of the pitcher plant would necessarily result in the loss of numerous species, species that have evolved for life in this unique habitat. Sarracenia alata and its inquiline community persist in an ecological entanglement, with evidence that interactions among multiple members of this unique community have been stable well into the Pleistocene.

114

References

Addicott JF (1974) Predation and prey community structure: an experimental study of the

effect of larvae on the protozoan communities of pitcher plants. Ecology, 55, 475–

492.

Adlassnig W, Peroutka M, Lendl T (2011) Traps of carnivorous pitcher plants as a

habitat: composition of the fluid, biodiversity and mutualistic activities. Annals of

Botany, 107, 181–194.

Akaike H (1974) A new look at the statistical model identification. IEEE Transactions on

Automatic Control AC, 19, 716–723.

Ané C, Larget B, Baum DA, Smith SD, Rokas A (2007) Bayesian estimation of

concordance among gene trees. Molecular Biology and Evolution, 24, 412–426.

Arbogast BS, Kenagy GJ (2001) Comparative phylogeography as an integrative approach

to historical biogeography. Journal of Biogeography, 28, 819–825.

Arbogast BS, Edwards SV, Wakeley J, Beerli P, Slowinski JB (2002) Estimating

divergence times from molecular data on phylogenetic and population genetic

timescales. Annual Review of Ecology and Systematics, 33, 707–740.

Arnold B, Corbett-Detig RB, Hartl D, Bomblies K (2013) RADseq underestimates

diversity and introduces genealogical biases due to nonrandom haplotype

sampling. Molecular Ecology, 22, 3179–3190.

115

Avise JC, Arnold J, Ball RM, Bermingham E, Lamb T, Neigel JE, Reeb CA, Saunders

NC (1987) Intraspecific phylogeography: the mitochondrial DNA bridge between

population genetics and systematics. Annual Review of Ecology and Systematics,

18, 489–522.

Avise JC (2000) Phylogeography: the history and formation of species. Harvard Univ.

Press, Cambridge, MA.

Bacon CD, Silvestro D, Jaramillo C, Smith BT, Chakrabarty P, Antonelli A (2015)

Biological evidence supports an early and complex emergence of the Isthmus of

Panama. Proceedings of the National Academy of Sciences, USA, 112, 6110–

6115.

Baum D (2007) Concordance trees, concordance factors, and the exploration of reticulate

genealogy. Taxon, 56, 417–426.

Bell RC, MacKenzie JB, Hickerson MJ, Chavarria KL, Cunningham M, Williams S,

Moritz C (2012) Comparative multi-locus phylogeography confirms multiple

vicariance events in co-distributed rainforest frogs. Proceedings of the Royal

Society B: Biological Sciences, 279, 991–999.

Bermingham E, Moritz C (1998) Comparative phylogeography: concepts and

applications. Molecular Ecology, 7, 367–369.

Bik HM, Porazinska DL, Creer S, Caporaso JG, Knight R, Thomas WK. 2012

Sequencing our way towards understanding global eukaryotic biodiversity.

Trends in Ecology & Evolution, 27, 233–243.

116

Bowen BW, Shanker K, Yasuda N, Malay MCD, von der Heyden S, Paulay G, Rocha

LA, Selkoe KA, Barber PH, Williams ST, Lessios HA, Crandall ED, Bernardi G,

Meyer CP, Carpenter KE, Toonen RJ (2014) Phylogeography unplugged:

comparative surveys in the genomic era. Bulletin of Marine Science, 90, 13–46.

Bradshaw WE, Creelman RA (1984) Mutualism between the carnivorous purple pitcher

plant and its inhabitants. American Midland Naturalist, 112, 294–304.

Brant SV, Orti G (2003) Phylogeography of the Northern short-tailed shrew, Blarina

brevicauda (Insectivora: Soricidae): past fragmentation and postglacial

recolonization. Molecular Ecology, 12, 1435–1449.

Brewer JS (2001) A demographic analysis of fire-stimulated seedling establishment of

Sarracenia alata (Sarraceniaceae). American Journal of Botany, 88, 1250–1257.

Bryant D, Bouckaert R, Felsenstein J, Rosenberg NA, RoyChoudhury A (2012) Inferring

species trees directly from biallelic genetic markers: bypassing gene trees in a full

coalescent analysis. Molecular Biology and Evolution, 29, 1917–1932.

Buckley HL, Miller TE, Ellison AM, Gotelli NJ (2003) Reverse latitudinal trends in

species richness of pitcher-plant food webs. Ecology Letters, 6, 825–829.

Burbrink FT, Lawson R, Slowinski JB (2000) Mitochondrial DNA phylogeography of

the polytypic North American rat snake (Elaphe obsoleta): a critique of the

subspecies concept. Evolution, 54, 2107–2118.

Burnham KP, Anderson DR (2002) Model Selection and Multimodal Inference: A

Practical Information Theoretic Approach, 2nd edn. Springer-Verlag, New York.

117

Carstens BC, Brunsfeld SJ, Demboski JR, Good JM, Sullivan J (2005) Investigating the

evolutionary history of the Pacific Northwest mesic forest ecosystem: hypothesis

testing within a comparative phylogeographic framework. Evolution, 59, 1639–

1652.

Carstens BC, Brennan RS, Chua V, Duffie CV, Harvey MG, Koch RA, McMahan CD,

Nelson BJ, Newman CE, Satler JD, Seeholzer G, Posbic K, Tank DC, Sullivan J

(2013) Model selection as a tool for phylogeographic inference: an example from

the willow Salix melanopsis. Molecular Ecology, 22, 4014–4028.

Carstens BC, Satler JD (2013) The carnivorous plant described as Sarracenia alata

contains two cryptic species. Biological Journal of the Linnean Society, 109, 737–

746.

Carstens BC, Gruenstaeudl M, Reid NM (2016) Community trees: identifying

codiversification in the Páramo dipteran community. Evolution, 70, 1080–1093.

Chen C, Denlinger DL, Lee RE (1987) Responses of nondiapausing flesh flies (Diptera:

Sarcophagidae) to low rearing temperatures: developmental rate, cold tolerance,

and glycerol concentrations. Annals of the Entomological Society of America, 80,

790–796.

Clayton DS., Bush S, Johnson K (2004) Ecology of congruence: past meets present.

Systematic Biology, 53, 165–173.

Coates AG, McNeill DF, Aubry MP, Berggren WA, Collins LS (2005) An introduction

to the geology of the bocas del toro archipelago, Panama. Caribbean Journal of

Science, 41, 374–391.

118

Coleman JM (1988) Dynamic changes and processes in the Mississippi River delta.

Geological Society of America Bulletin, 100, 999–1015.

Cowman PF, Bellwood DR (2013) Vicariance across major marine biogeographic

barriers: temporal concordance and the relative intensity of hard versus soft

barriers. Proceedings of the Royal Society B: Biological Sciences, 280, 20131541.

Crews SC, Hedin M (2006) Studies of morphological and molecular phylogenetic

divergence in spiders (Araneae: Homalonychus) from the American southwest,

including divergence along the Baja California Peninsula. Molecular

Phylogenetics and Evolution, 38, 470–487.

Cummings MP, Neel MC, Shaw KL (2008) A genealogical approach to quantifying

lineage divergence. Evolution, 62, 2411–2422.

Dahlem GA, Naczi RC (2006) Flesh flies (Diptera: Sarcophagidae) associated with North

American pitcher plants (Sarraceniaceae), with descriptions of three new species.

Annals of the Entomological Society of America, 99, 218–240.

Darwin C (1859) The origin of species by means of natural selection, or the preservation

of favoured races in the struggle for life. John Murray, London, UK.

Darwin C (1875) Insectivorous plants. John Murray, London, UK.

Dawson MN (2013) Natural experiments and meta-analyses in comparative

phylogeography. Journal of Biogeography, 41, 52–65.

Dellicour S, Mardulyn P (2014) SPADS 1.0: a toolbox to perform spatial analyses on

DNA sequence data sets. Molecular Ecology Resources, 14, 647–651.

119

Denlinger DL (1978) The developmental response of flesh flies (Diptera: Sarcophagidae)

to tropical seasons. Oecologia, 35, 105–107.

Dyer RJ (2012) The gstudio package. Virginia: Virginia Commonwealth University.

Eaton DA (2014) PyRAD: assembly of de novo RADseq loci for phylogenetic analyses.

Bioinformatics, 30, 1844–1849.

Edger PP, Heidel-Fischer HM, Bekaert M, Rota J, Glöckner G, Platts AE, Heckel DG,

Der JP, Wafula EK, Tang M, Hofberger JA, Smithson A, Hall JC, Blanchette M,

Bureau TE, Wright SI, dePamphilis CW, Schranz ME, Barker MS, Conant GC,

Wahlberg N, Vogel H, Pires JC, Wheat CW (2015) The butterfly plant arms-race

escalated by gene and genome duplications. Proceedings of the National Academy

of Sciences, USA, 112, 8362–8366.

Edgar RC (2013) UPARSE: highly accurate OTU sequences from microbial amplicon

reads. Nature Methods, 10, 996–998.

Edwards SV, Beerli P (2000) Gene divergence, population divergence, and the variance

in coalescence time in phylogeographic studies. Evolution, 54, 1839–1854.

Edwards SV, Liu L, Pearl DK (2007) High-resolution species trees without

concatenation. Proceedings of the National Academy of Sciences, USA, 104,

5936–5941.

Ehrlich PR, Raven PH (1964) Butterflies and plants: a study in coevolution. Evolution,

18, 586–608.

120

Ellison AM (2001) Interspecific and intraspecific variation in seed size and germination

requirements of Sarracenia (Sarraceniaceae). American Journal of Botany, 88,

429–437.

Ellison AM, Gotelli NJ (2001) Evolutionary ecology of carnivorous plants. Trends in

Ecology & Evolution, 16, 623–629.

Ellison AM, Parker JN (2002) Seed dispersal and seedling establishment of Sarracenia

purpurea (Sarraceniaceae). American Journal of Botany, 89, 1024–1026.

Ellison AM, Gotelli NJ (2009) Energetics and the evolution of carnivorous plants—

Darwin’s ‘most wonderful plants in the world’. Journal of Experimental Botany,

60, 19–42.

Espíndola A, Carstens BC, Alvarez N (2014) Comparative phylogeography of mutualists

and the effect of the host on the genetic structure of its partners. Biological

Journal of the Linnean Society, 113, 1021–1035.

Excoffier L, Smouse PE, Quattro JM (1992) Analysis of molecular variance inferred

from metric distances among DNA haplotypes: application to human

mitochondrial DNA restriction data. Genetics, 131, 479–491.

Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M (2013) Robust

demographic inference from genomic and SNP data. PLoS Genetics, 9, e1003905.

Fagundes NJ, Ray N, Beaumont M, Neuenschwander S, Salzano FM, Bonatto SL,

Excoffier L (2007) Statistical evaluation of alternative models of human

evolution. Proceedings of the National Academy of Sciences, USA, 104, 17614–

17619.

121

Farrell B, Mitter C (1990) Phylogenesis of insect/plant interactions: have Phyllobrotica

leaf (Chrysomelidae) and the Lamiales diversified in parallel? Evolution,

44, 1389–1403.

Fashing NJ, OConnor BM (1984) Sarraceniopus–a new genus for Histiostomatid mites

inhabiting the pitchers of the Sarraceniaceae (Astigmata: Histiostomatidae).

International Journal of , 10, 217–227.

Felsenstein J (2006) Accuracy of coalescent likelihood estimates: do we need more sites,

more sequences, or more loci? Molecular Biology and Evolution, 23, 691–700.

Finlay BJ (2002) Global dispersal of free-living microbial eukaryote species. Science,

296, 1061–1063.

Fish D, Hall DW (1978) Succession and stratification of aquatic insects inhabiting the

leaves of the insectivorous pitcher plant, Sarracenia purpurea. American Midland

Naturalist, 99, 172–183.

Foelix RF (1982) Biology of spiders. Harvard University Press.

Folkerts D (1999) Pitcher plant wetlands of the southeastern United States. Pp. 247–275

in Batzer DP, Rader RB, Wissinger SA, eds. Invertebrates in Freshwater

Wetlands of North America: Ecology and Management. John Wiley and Sons,

Inc., New York, NY.

Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994) DNA primers for

amplification of mitochondrial cytochrome c oxidase subunit I from diverse

metazoan invertebrates. Molecular Marine Biology and Biotechnology, 3, 294–

299.

122

Forsyth AB, Robertson RJ (1975) K reproductive strategy and larval behavior of the

pitcher plant sarcophagid fly, Blaesoxipha fletcheri. Canadian Journal of

Zoology, 53, 174–179.

Fouquet A, Noonan BP, Rodrigues MT, Pech N, Gilles A, Gemmell NJ (2012) Multiple

Quaternary refugia in the eastern Guiana shield revealed by comparative

phylogeography of 12 frog species. Systematic Biology, 61, 461–489.

Francis RM (2016) pophelper: an r package and web app to analyse and visualize

population structure. Molecular Ecology Resources, in press.

Funk WC, Caldwell JP, Peden CE, Padial JM, De la Riva I, Cannatella DC (2007) Tests

of biogeographic hypotheses for diversification in the Amazonian forest frog,

Physalaemus petersi. Molecular Phylogenetics and Evolution, 44, 825–837.

Garrick RC, Rowell DM, Simmons CS, Hillis DM, Sunnucks P (2008) Fine-scale

phylogeographic congruence despite demographic incongruence in two low-

mobility sproxylic springtails. Evolution, 62, 1103–1118.

Gascon C, Malcolm JR, Patton JL, da Silva MN, Bogart JP, Lougheed SC, Peres CA,

Neckel S, Boag PT (2000) Riverine barriers and the geographic distribution of

Amazonian species. Proceedings of the National Academy of Sciences, USA, 97,

13672–13677.

Gotelli NJ, Ellison AM (2006) Food-web models predict species abundances in response

to habitat change. PLoS Biology, 4, e324.

123

Greenstone MH, Morgan CE, Hultsch A, Farrow RA, Dowse JE (1987) Ballooning

spiders in Missouri, USA, and New South Wales, Australia: family and mass

distributions. Journal of , 15, 163–170.

Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large

phylogenies by maximum likelihood. Systematic Biology, 52, 696–704.

Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD (2009) Inferring the

joint demographic history of multiple populations from multidimensional SNP

frequency data. PLoS Genetics, 5, e1000695.

Haag-Liautard C, Dorris M, Maside X, Macaskill S, Halligan DL, Charlesworth B,

Keightley PD (2007) Direct estimation of per nucleotide and genomic deleterious

mutation rates in Drosophila. Nature, 445, 82–85.

Haag-Liautard C, Coffey N, Houle D, Lynch M, Charlesworth B, Keightley PD (2008)

Direct estimation of the mitochondrial DNA mutation rate in Drosophila

melanogaster. PLoS Biology, 6, e204.

Hafner MS, Demastes JW, Spradling TA, Reed DL (2003) Cophylogeny between pocket

gophers and chewing lice. Pp. 195–218 in Tangled trees: phylogeny, cospeciation,

and coevolution (R.D.M. Page, ed.). University of Chicago Press, Chicago.

Hafner DJ, Riddle BR (2005) Mammalian phylogeography and evolutionary history of

northern Mexico’s deserts. Biodiversity, ecosystems, and conservation in northern

Mexico (J.-LE Cartron, G. Ceballos, and RS Felger, eds.). Oxford University

Press, New York, 225–245.

124

Harvey E, Miller TE (1996) Variance in composition of inquiline communities in leaves

of Sarracenia purpurea L. on multiple spatial scales. Oecologia, 108, 562–566.

Hausner G, Reid J, Klassen GR (1993) On the subdivision of Ceratocystis s.l., based on

partial ribosomal DNA sequences. Canadian Journal of Botany, 71, 52–63.

Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data.

Molecular Biology and Evolution, 27, 570–580.

Herbert PDN, Ratnasingham S, deWaard JR (2003) Barcoding animal life: cytochrome c

oxidase subunit 1 divergences among closely related species. Proceedings of the

Royal Society B: Biological Sciences, 270, s96–s99.

Hewitt G (2000) The genetic legacy of the Quaternary ice ages. Nature, 405, 907–913.

Hewitt GM (2004) Genetic consequences of climatic oscillations in the Quaternary.

Philosophical Transactions of the Royal Society of London B: Biological

Sciences, 359, 183–195.

Hickerson MJ, Stahl EA, Lessios HA (2006) Test for simultaneous divergence using

approximate bayesian computation. Evolution, 60, 2435–2453.

Hickerson MJ, Stahl E, Takebayashi N (2007) msBayes: pipeline for testing comparative

phylogeographic histories using hierarchical approximate Bayesian computation.

BMC Bioinformatics, 8, 268.

Hickerson MJ, Carstens BC, Cavender-Bares J, Crandall KA, Graham CH, Johnson JB,

Rissler L, Victoriano PF, Yoder AD (2010) Phylogeography’s past, present, and

future: 10 years after Avise, 2000. Molecular Phylogenetics and Evolution, 54,

291–301.

125

Hope AG, Ho SYW, Malaney JL, Cook JA, Talbot SL (2014) Accounting for rate

variation among lineages in comparative demographic analyses. Evolution, 68,

2689–2700.

Huang W, Takebayashi N, Qi Y, Hickerson MJ (2011) MTML-msBayes: Approximate

Bayesian comparative phylogeographic inference from multiple taxa and multiple

loci with rate heterogeneity. BMC Bioinformatics, 12, 1.

Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of genetic

variation. Bioinformatics, 18, 337–338.

Hudson RR, Coyne JA (2002) Mathematical consequences of the genealogical species

concept. Evolution, 56, 1557–1565.

Hugall A, Moritz C, Moussalli A, Stanisic J (2002) Reconciling paleodistribution models

and comparative phylogeography in the Wet Tropics rainforest land snail

Gnarosophia bellendenkerensis (Brazier 1875) Proceedings of the National

Academy of Sciences, USA, 99, 6112–6117.

Hunter PE, Hunter CA (1964) A new Anoetus mite from pitcher plants. Proceedings of

the Entomological Society of Washington, 66, 39–46.

Jackson ND, Austin CC (2010) The combined effects of rivers and refugia generate

extreme cryptic fragmentation within the common ground skink (Scincella

lateralis). Evolution, 64, 409–428.

Jensen JL, Bohonak AJ, Kelley ST (2005) Isolation by distance, web service. BMC

Genetics, 6, 13. v.3.23 http://ibdws.sdsu.edu/

126

Johnson MTJ, Stinchcombe JR (2007) An emerging synthesis between community

ecology and evolutionary biology. Trends in Ecology & Evolution, 22, 250–257.

Jombart T (2008) adegenet: a r package for the multivariate analysis of genetic markers.

Bioinformatics, 24, 1403–1405.

Jones FM (1921) Pitcher plants and their moths. Natural History, 21, 297–316.

Jousselin E, van Noort S, Berry V, Rasplus J, Rønsted N, Erasmus JC, Greeff JM (2008)

One fig to bind them all: host conservatism in a fig wasp community unraveled by

cospeciation analyses among pollinating and nonpollinating fig wasps. Evolution,

62, 1777–1797.

Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7:

improvements in performance and usability. Molecular Biology and Evolution,

30, 772–780.

Kingman JFC (1982) The coalescent. Stochastic processes and their applications, 13,

235–248.

Kitching RL (2000) Food webs and container habitats: the natural history and ecology of

phytotelmata. Cambridge University Press.

Klicka J, Zink RM (1997) The importance of recent ice ages in speciation: a failed

paradigm. Science, 277, 1666–1669.

Knowles LL, Maddison WP (2002) Statistical phylogeography. Molecular Ecology, 11,

2623–2635.

127

Knowlton N, Weigt LA (1998) New dates and new rates for divergence across the

Isthmus of Panama. Proceedings of the Royal Society B: Biological Sciences, 265,

2257–2263.

Koopman MM, Carstens BC (2010) Conservation genetic inferences in the carnivorous

pitcher plant Sarracenia alata (Sarraceniaceae). Conservation Genetics, 11,

2027–2038.

Koopman MM, Fuselier DM, Hird S, Carstens BC (2010) The carnivorous pale pitcher

plant harbors diverse, distinct, and time-dependent bacterial communities. Applied

and Environmental Microbiology, 76, 1851–1860.

Koopman MM, Carstens BC (2011) The microbial phyllogeography of the carnivorous

plant Sarracenia alata. Microbial Ecology, 61, 750–758.

Koski LB, Golding GB (2001) The closest BLAST hit is often not the nearest neighbor.

Journal of Molecular Evolution, 52, 540–542.

Krawchuk MA, Taylor PD (2003) Changing importance of habitat structure across

multiple spatial scales for three species of insects. Oikos, 103, 153–161.

Kubatko LS, Carstens BC, Knowles LL (2009) STEM: species tree estimation using

maximum likelihood for gene trees under coalescence. Bioinformatics, 25, 971–

973.

Lapointe FJ, Rissler LJ (2005) Congruence, consensus, and the comparative

phylogeography of codistributed species in California. The American Naturalist,

166, 290–299.

128

Larget BR, Kotha SK, Dewey CN, Ané C (2010) BUCKy: Gene tree/species tree

reconciliation with bayesian concordance analysis. Bioinformatics, 26, 2910–

2911.

Leaché AD, Crews SC, Hickerson MJ (2007) Two waves of diversification in mammals

and reptiles of Baja California revealed by hierarchical Bayesian analysis. Biology

Letters, 3, 646–650.

Leaché AD, Fujita MK (2010) Bayesian species delimitation in west African forest

geckos (Hemidactylus fasciatus). Proceedings of the Royal Society B: Biological

Sciences, 277, 3071–3077.

Lee RE, Chen C, Denlinger DL (1987) A rapid cold-hardening process in insects.

Science, 238, 1415–1417.

Leigh EG, O'Dea A, Vermeij GJ (2014). Historical biogeography of the Isthmus of

Panama. Biological Reviews, 89, 148–172.

Lessios HA (2008) The great American schism: divergence of marine organisms after the

rise of the Central American Isthmus. Annual Review of Ecology, Evolution, and

Systematics, 39, 63–91.

Lindell J, Ngo A, Murphy RW (2006) Deep genealogies and the mid-peninsular seaway

of Baja California. Journal of Biogeography, 33, 1327–1331.

Lischer HEL, Excoffier L (2012) PGDSpider: an automated data conversion tool for

connecting population genetics and genomics programs. Bioinformatics, 28, 298–

299.

129

Maddison DR, Maddison WP (2005) MacClade 4: analysis of phylogeny and character

evolution. Version 4.08a. Available from: http://macclade.org.

Mann CJ, Thomas WA (1968) The ancient Mississippi River. Gulf Coast Association of

Geological Societies Transactions, 18, 187–204.

Mardis ER (2008) Next-generation DNA sequencing methods. Annual Review of

Genomics and Human Genetics, 9, 387–402.

Marshall LG, Webb SD, Sepkoski JJ, Raup DM (1982) Mammalian evolution and the

great American interchange. Science, 215, 1351–1357.

McCormack JE, Heled J, Delaney KS, Peterson AT, Knowles LL (2011) Calibrating

divergence times on species trees versus gene trees: implications for speciation

history of Aphelocoma jays. Evolution, 65, 184–202.

McCormack JE, Hird SM, Zellmer AJ, Carstens BC, Brumfield RT (2013) Applications

of next-generation sequencing to phylogeography and phylogenetics. Molecular

Phylogenetics and Evolution, 66, 526–538.

McKenna DD, Sequeira AS, Marvaldi AE, Farrell BD (2009) Temporal lags and overlap

in the diversification of weevils and flowering plants. Proceedings of the National

Academy of Sciences, USA, 106, 7083–7088.

McPherson S (2007) Pitcher plants of the Americas. The McDonald and Woodward

Publishing Company, Blacksburgh, VA.

Miller TE, Kneitel JM (2005) Inquiline communities in pitcher plants as a prototypical

metacommunity. In: Metacommunities: spatial dynamics and ecological

130

communities (eds Holyoak M, Leibold MA, Holt R) University of Chicago Press,

Chicago, IL, pp.122.145.

Miller TE, terHorst CP (2012) Testing successional hypotheses of stability,

heterogeneity, and diversity in pitcher-plant inquiline communities. Oecologia,

170, 243–251.

Moon DC, Rossi A, Stokes K, Moon J (2008) Effects of the pitcher plant mining moth

Exyra semicrocea on the hooded pitcher plant Sarracenia minor. American Middle

Naturalist, 159, 321–326.

Muma MH, Denmark HA (1967) Biological studies on Macroseius biscutatus (Acarina:

Phytoseiidae). Florida Entomologist, 50, 249–255.

Nei M (1973) Analysis of gene diversity in subdivided populations. Proceedings of the

National Academy of Sciences, USA, 70, 3321–3323.

Nelson G, Platnick NI (1981) Systematics and biogeography – cladistics and vicariance.

Columbia University Press, New York, New York.

Newell SJ, Nastase AJ (1998) Efficiency of insect capture by Sarracenia purpurea

(Sarraceniaceae), the northern pitcher plant. American Journal of Botany, 85, 88–

91.

Newman CE, Rissler LJ (2011) Phylogeographic analyses of the southern leopard frog:

the impact of geography and climate on the distribution of genetic lineages vs.

subspecies. Molecular Ecology, 20, 5295–5312.

Nielsen R (2000) Estimation of population parameters and recombination rates from

single nucleotide polymorphisms. Genetics, 154, 931–942.

131

Noss RF (1989) Longleaf pine and wiregrass: keystone components of an endangered

ecosystem. Natural Areas Journal, 9, 211–213.

Noss RF, LaRoe III ET, Scott JM (1995) Endangered ecosystems of the United States: A

preliminary assessment of loss and degradation. Vol. 28. Washington, D.C.,

U.S.A.: U. S. Department of the Interior, National Biological Service.

O’Brien HE, Parrent JL, Jackson JA, Moncalvo JM, Vilgalys R (2005) Fungal

community analysis by large-scale sequencing of environmental samples. Applied

and Environmental Microbiology, 71, 5544–5550.

Oaks JR, Sukumaran J, Esselstyn JA, Linkem CW, Siler CD, Holder MT, Brown RM

(2012) Evidence for climate-driven diversification? a caution for interpreting

ABC inferences of simultaneous historical events. Evolution, 67, 991–1010.

Oaks JR (2014) An improved approximate-Bayesian model-choice method for estimating

shared evolutionary history. BMC Evolutionary Biology, 14, 150.

Oaks JR, Linkem CW, Sukumaran J (2014) Implications of uniformly distributed,

empirically informed priors for phylogeographical model selection: a reply to

Hickerson et al. Evolution, 68, 3607–3617.

Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara RB, Simpson GL,

Solymos P, Stevens MHH, Wagner H (2015) vegan: community ecology package.

R package version 2.2-1. http://CRAN.R-project.org/package=vegan

Page TJ, Hughes JM (2014) Contrasting insights provided by single and multispecies

data in a regional comparative phylogeographic study. Biological Journal of the

Linnean Society, 111, 554–569.

132

Papadopoulou A, Knowles LL (2015) Species-specific responses to island connectivity

cycles: refined models for testing phylogeographic concordance across a

Mediterranean Pleistocene aggregate island complex. Molecular Ecology, 24,

4252–4268.

Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in

R language. Bioinformatics, 20, 289–290.

Paradis E (2010) pegas: an R package for population genetics with an integrated-modular

approach. Bioinformatics, 26, 419–420.

Peay KB, Kennedy PG, Bruns TD (2008) Fungal community ecology: a hybrid beast

with a molecular master. Bioscience, 58, 799–810.

Pelletier TA, Carstens BC (2014) Model choice for phylogeographic inference using a

large set of models. Molecular Ecology, 23, 3028–3043.

Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE (2012). Double digest

RADseq: an inexpensive method for de novo SNP discovery and genotyping in

model and non-model species. PloS ONE, 7, e37135.

Peterson CN, Day S, Wolfe BE, Ellison AM, Kolter R, Pringle A (2008) A keystone

predator controls bacterial diversity in the pitcher-plant (Sarracenia purpurea)

microecosystem. Environmental Microbiology, 10, 2257–2266.

Porter SD, Savignano DA (1990) Invasion of polygyne fire ants decimates native ants

and disrupts arthropod community. Ecology, 71, 2095–2106.

Posada D (2008) jModelTest: phylogenetic model averaging. Molecular Biology and

Evolution, 25, 1253–1256.

133

Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using

multilocus genotype data. Genetics, 155, 945–959.

Puechmaille SJ (2016) The program structure does not reliably recover the correct

population structure when sampling is uneven: subsampling and new estimators

alleviate the problem. Molecular Ecology Resources, 16, 608–627.

Pyron RA, Burbrink FT (2009) Lineage diversification in a widespread species: roles for

niche divergence and conservatism in the common kingsnake, Lampropeltis

getula. Molecular Ecology, 18, 3443–3457.

Pyron RA, Burbrink FT (2010) Hard and soft allopatry: physically and ecologically

mediated modes of geographic speciation. Journal of Biogeography, 37, 2005–

2015.

R Core Team (2013) R: a language and environment for statistical computing. R

Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-

project.org/.

R Core Team (2015) R: a language and environment for statistical computing. R

Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-

project.org/.

Rambaut A, Grass NC (1997) Seq-gen: an application for the monte carlo simulation of

DNA sequence evolution along phylogenetic trees. Computer Applications in the

Biosciences, 13, 235–238.

Rambaut A, Suchard MA, Xie D, Drummond AJ (2014) Tracer v1.6, Available from

http://beast.bio.ed.ac.uk/Tracer

134

Rango JJ (1999) Resource dependent larviposition behavior of a pitcher plant flesh fly,

Fletcherimyia fletcheri (Aldrich)(Diptera: Sarcophagidae). Journal of the New

York Entomological Society, 107, 82–86.

Rasic G, Keyghobadi N (2012). The pitcher plant flesh fly exhibits a mixture of patchy

and metapopulation attributes. Journal of Heredity, 103, 703–710.

Rehm P, Borner J, Meusemann K, von Reumont BM, Simon S, Hadrys H, Misof B,

Burmester T (2011) Dating the arthropod tree based on large-scale transcriptome

data. Molecular Phylogenetics and Evolution, 61, 880–887.

Richardson JE, Pennington RT, Pennington TD, Hollingsworth PM (2001) Rapid

diversification of a species-rich genus of Neotropical rain forest trees. Science,

293, 2242–2245.

Riddle BR, Hafner DJ, Alexander LF, Jaeger JR (2000) Cryptic vicariance in the

historical assembly of a Baja California peninsular desert biota. Proceedings of

the National Academy of Sciences, USA, 97, 14438–14443.

Robert CP, Cornuet J, Marin J, Pillai NS (2011) Lack of confidence in approximate

Bayesian computation model choice. Proceedings of the National Academy of

Sciences, USA, 108, 15112–15117.

Roe AD, Rice AV, Coltman DW, Cooke JEK, Sperling FAH (2010) Comparative

phylogeography, genetic differentiation and contrasting reproductive modes in

three fungal symbionts of a multipartite bark symbiosis. Molecular

Ecology, 20, 584–600.

135

Rokas A, Carroll SB (2005) More genes or more taxa? the relative contribution of gene

number and taxon number to phylogenetic accuracy. Molecular Biology and

Evolution, 22, 1337–1344.

Rønsted N, Weiblen GD, Cook JM, Salamin N, Machado CA, Savolainen V (2005) 60

million years of co-divergence in the fig-wasp symbiosis. Proceedings of the

Royal Society B: Biological Sciences, 272, 2593–2599.

Satler JD, Zellmer AJ, Carstens BC (2016) Biogeographic barriers drive co-

diversification within associated eukaryotes of the Sarracenia alata pitcher plant

system. PeerJ, 4, e1576.

Satler JD, Carstens BC (2016) Phylogeographic concordance factors quantify

phylogeographic congruence among co-distributed species in the Sarracenia alata

pitcher plant system. Evolution, 70, 1105–1119.

Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA,

Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn

DJ, Weber CF (2009) Introducing mothur: open-source, platform-independent,

community-supported software for describing and comparing microbial

communities. Applied and Environmental Microbiology, 75, 7537–7541. da Silva MNF, Patton JL (1998) Molecular phylogeography and the evolution and

conservation of Amazonian mammals. Molecular Ecology, 7, 475–486.

Smith BT, Amei A, Klicka J (2012) Evaluating the role of contracting and expanding

rainforest in initiating cycles of speciation across the Isthmus of Panama.

Proceedings of the Royal Society B: Biological Sciences, 279, 3520–3526.

136

Smith BT, McCormack JE, Cuervo AM, Hickerson MJ, Aleixo A, Cadena CD, Pérez-

Emán J, Burney CW, Xie X, Harvey MG, Faircloth BC, Glenn TC, Derryberry

EP, Prejean J, Fields S, Brumfield RT (2014a) The drivers of tropical speciation.

Nature, 515, 406–409.

Smith BT, Harvey MG, Faircloth BC, Glenn TC, Brumfield RT (2014b) Target capture

and massively parallel sequencing of ultraconserved elements for comparative

studies at shallow evolutionary time scales. Systematic Biology, 63, 83–95.

Smith CI, Tank S, Godsoe W, Levenick J, Strand E, Esque T, Pellmyr O (2011)

Comparative phylogeography of a coevolved community: concerted population

expansions in Joshua Trees and four yucca moths. PLoS ONE, 6, e25628.

Soltis DE, Morris AB, McLachlan JS, Manos PS, Soltis PS (2006) Comparative

phylogeography of unglaciated eastern North America. Molecular Ecology, 15,

4261–4293.

Sota T, Kagata H, Ando Y, Utsumi S, Osono T (2014) Metagenomic approach yields

insights into fungal diversity and functioning. Species Diversity and Community

Structure (pp. 1–23). Springer Japan.

Spitzer K, Rejmánek M, Soldán T (1984) The fecundity and long-term variability in

abundance of noctuid moths (, ). Oecologia, 62, 91–93.

Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic

analyses with thousands of taxa and mixed models. Bioinformatics, 22, 2688–

2690.

137

Stamatakis A, Hoover P, Rougemont J (2008) A rapid bootstrap algorithm for the

RAxML web servers. Systematic Biology, 57, 758–771.

Stephens JD, Santos SR, Folkerts DR (2011) Genetic differentiation, structure, and a

transition zone among populations of the pitcher plant moth Exyra semicrocea:

implications for conservation. PLoS ONE, 6, e22658.

Stephens JD, Folkerts DR (2012) Life history aspects of Exyra semicrocea (pitcher plant

moth)(Lepidoptera: Noctuidae). Southeastern Naturalist, 11, 111–126.

Stephens JD, Rogers WL, Heyduk K, Cruse-Sanders JM, Determann RO, Glenn TC,

Malmberg RL (2015) Resolving phylogenetic relationships of the recently

radiated carnivorous plant genus Sarracenia using target enrichment. Molecular

Phylogenetics and Evolution, 85, 76–87.

Stuble KL, Kirkman LK, Carroll CR (2009) Patterns of abundance of fire ants and native

ants in a native ecosystem. Ecological , 34, 520–526.

Sukumaran J, Holder MT (2010) DenroPy: a python library for phylogenetic computing.

Bioinformatics, 26, 1569–1571.

Sukumaran J, Holder MT (Jan 31 2015) SumTrees: phylogenetic tree summarization.

4.0.0. Available at https://github.com/jeetsukumaran/DendroPy.

Sullivan J, Arellano E, Rogers DS (2000) Comparative phylogeography of Mesoamerican

highland rodents: concerted versus independent response to past climatic

fluctuations. The American Naturalist, 155, 755–768.

138

Swenson NG, Howard DJ (2005) Clustering of contact zones, hybrid zones, and

phylogeographic breaks in North America. The American Naturalist, 166, 581–

591.

Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA

polymorphism. Genetics, 123, 585–595.

Taylor JT, Turner E, Townsend JP, Dettman JR, Jacobson D (2006) Eukaryotic microbes,

species recognition and the geographic limits of species: examples from the

kingdom fungi. Philosophical Transactions of the Royal Society B, 361, 1947–

1963. terHorst CP, Miller TE, Levitan DR (2010) Evolution of prey in ecological time reduces

the effect size of predators in experimental microcosms. Ecology, 91, 629–636.

Thomé MCT, Carstens BC (2016) Phylogeographic model selection leads to insight into

the evolutionary history of four-eyed frogs. Proceedings of the National Academy

of Sciences, USA, in press.

Tringe SG, Rubin EM (2005) Metagenomics: DNA sequencing of environmental

samples. Nature Reviews Genetics, 6, 805–814.

Upton DE, Murphy RW (1997) Phylogeny of the side-blotched lizards

(Phrynosomatidae: Uta) based on mtDNA sequences: support for a midpeninsular

seaway in Baja California. Molecular Phylogenetics and Evolution, 8, 104–113.

Vieites DR, Min MS, Wake DB (2007) Rapid diversification and dispersal during periods

of global warming by plethodontid salamanders. Proceedings of the National

Academy of Sciences, USA, 104, 19903–19907.

139

Wallace AR (1852) On the monkeys of the Amazon. Proceedings of the Zoological

Society of London, 20, 107–110.

Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a

versatile toolkit for approximate Bayesian computations. BMC Bioinformatics,

11, 116.

Weiblen GD, Bush GL (2002) Speciation in fig pollinators and parasites. Molecular

Ecology, 11, 1573–1578.

Weisrock DW, Janzen FJ (2000) Comparative molecular phylogeography of North

American softshell turtles (Apalone): implications for regional and wide-scale

historical evolutionary forces. Molecular Phylogenetics and Evolution, 14, 152–

164.

Wheat CW, Vogel H, Wittstock U, Braby MF, Underwood D, Mitchell-Olds T (2007)

The genetic basis of a plant-insect coevolutionary key innovation. Proceedings of

the National Academy of Sciences, USA, 104, 20427–20431.

Whiteman NK, Kimball RT, Parker PG (2007) Co-phylogeography and comparative

population genetics of the threatened Galápagos hawk and three ectoparasite

species: ecology shapes population histories within parasite communities.

Molecular Ecology, 16, 4759–4773.

Zellmer AJ, Hanes MM, Hird SM, Carstens BC (2012) Deep phylogeographic structure

and environmental differentiation in the carnivorous plant Sarracenia alata.

Systematic Biology, 61, 763–777.

140

Zimmerman NB, Vitousek PM (2012) Fungal endophyte communities reflect

environmental structuring across a Hawaiian landscape. Proceedings of the

National Academy of Sciences, USA, 109, 13022–13027.

Zwickl DJ (2006) Genetic algorithm approaches for the phylogenetic analysis of large

biological sequence datasets under the maximum likelihood criterion. Ph.D.

dissertation, The University of Texas at Austin.

141

Appendix A: Chapter 2 Supplemental Material

A.1. Sampling distribution for each OTU. Values represent the number of sequences from the corresponding locale.

West East Code C K A L T Fungi1 0 21 0 26 5 Fungi2 0 7 3 40 1 Fungi3 4 4 2 9 3 Fungi4 270 170 302 1458 307 Fungi5 7 15 12 50 13 Fungi6 19 12 20 92 25 Fungi7 14 5 12 48 5 Fungi8 7 4 7 35 4 Fungi9 5 2 4 16 3 Fungi10 5 3 5 33 8 Fungi11 18 0 66 74 31 Fungi12 0 9 1 4 0 Fungi13 3 1 2 29 5 Fungi14 73 36 75 463 119 Fungi15 0 5 7 3 0 Amoebozoa1 29 3 32 71 92 Alveolata1 12 0 0 6 0 Nematoda1 0 68 8 1 2 Nematoda2 0 124 29 1 0 Nematoda3 0 18 3 0 0 Insect1 25 13 0 23 0 Insect2 9 6 0 18 4 Insect3 2 32 0 6 1 Mite1 144 71 34 509 70 Mite2 2 2 4 21 1 Mite3 318 132 173 378 70 Mite4 16 5 12 16 7

142

Mite5 17 7 1 5 4 Mite6 6 9 11 18 6 Mite7 19 7 4 10 5 Unknown 3 1 0 60 2

A.2. Results from AMOVA, GSI and IBD analyses. AMOVA analyses show the hierarchical partitioning scheme of locales within regions (ΦSC), locales within total distribution (ΦST), and between regions (ΦCT). Mantel r values are reported for the IBD analyses. Asterisks denote significant values (* < 0.05 – 0.01; ** < 0.01 – 0.001; *** <

0.001).

Taxa ΦSC ΦST ΦCT GSIE GSIW IBD Fungi1 -0.0790 0.0171 0.0891 0.1905 0.1026 0.8911 Fungi2 -0.1873 0.1595** 0.2920 0.2560* 0.2647** 0.6555 Fungi3 -0.0266 0.2030** 0.2237 0.1711 0.3750* 0.6687 Fungi4 0.0035*** 0.0025** -0.0010 0.0200 0.0400 -0.2124 Fungi5 0.0369 0.0472* 0.0108 0.1848** 0.0326 0.0021 Fungi6 -0.0021 -0.0170 -0.0149 0.1077 0.1919*** -0.5135 Fungi7 0.0350 0.0115 -0.0243 0 0.0468 -0.0657 Fungi8 -0.0340 0.0747* 0.1051 0.0744 0.0473 0.6378 Fungi9 0.1850 0.0844 -0.1235 0 0.0417 -0.2010 Fungi10 0.0200 -0.0172 -0.0380 0 0.0546 -0.0680 Fungi11 -0.0041 -0.0043 -0.0001 0 0.0269 0.0450 Fungi12 0.5276 0.5914*** 0.1351 1*** 0.7111** 0.8576 Fungi13 0.0277 0.0382 0.0109 0 0.0972 0.6806** Fungi14 0.0063* 0.0152*** 0.0089 0.0158 0.0421 0.5492* Fungi15 -0.0538 -0.0691 -0.0145 0 0.0667 -0.5888 Amoebozoa1 0.0006 -0.0043 -0.0048 0 0.0163 -0.4741 Alveolata1 NA NA NA 0.1736 0 NA Nematoda1 -0.3959 -0.0960 0.2148 0.0474 0 -0.0356 Nematoda2 0.4389 0.0147 -0.7560 0.1037 0.0270 -0.6878 Nematoda3 NA NA NA 0.1111 0 NA Insect1 0.6965*** 0.4416*** -0.8399 0.1930** 0.0555 -0.3276 Insect2 0.0548 0.0537 -0.0011 0.1750 0.0579 0.4669 Insect3 0.7490*** 0.4277*** -1.2799 0.1597 0.1209 0.1049 143

Mite1 0.0099** 0.0198*** 0.0100 0.1220*** 0.0773** 0.1900 Mite2 -0.1093 -0.0400 0.0625 0.2232 0.0938 0.2335 Mite3 0.0098*** 0.0232*** 0.0135 0.2108*** 0.1017*** 0.6383 Mite4 0.0181 0.0313 0.0134 0.2361* 0.1270 0.1497 Mite5 -0.1100 -0.1131 -0.0028 0.1406 0 0.3589 Mite6 0.1470** 0.1250** -0.0257 0.1478 0.1600 -0.1140 Mite7 0.0447 0.0194 -0.0265 0.0507 0.0963 0.3238 Unknown -0.1937 -0.1484 0.0379 0.0681 0 0.2020 Host plant 0.7342*** 0.9190*** 0.6952** 1*** 1*** 0.6427**

144

A.3. BLAST information for OTUs from each locality, including number of sequences (N) in the OTU and GenBank (GB) indentifier number. Taxonomic information includes higher level grouping (see Fig. 3), plus the lowest taxonomic information available from the corresponding BLAST hit.

OTU N GB Identifier E-value % Identity A (Abita Springs) OTU_A_11 7 KP192855 1.48E-140 100 Fungi Nigrospora OTU_A_16 1 HQ634655 8.92E-138 99.63 Fungi unclassified Chaetothyriales OTU_A_19 2 JQ403623 2.48E-138 99.64 Fungi Alternaria OTU_A_21 4 KR873276 1.48E-140 100 Fungi Pestalotiopsis OTU_A_22 1 KJ130983 4.27E-116 94.91 Fungi Simplicillium 145 OTU_A_23 1 AB378534 2.48E-138 99.64 Fungi Simplicillium

OTU_A_27 1 EF413604 2.50E-133 98.55 Fungi Capronia OTU_A_30 3 KP122800 2.48E-138 99.64 Fungi Fusarium fujikuroi species complex OTU_A_39 2 AF033439 8.92E-138 99.63 Fungi Penicillium OTU_A_45 1 KC455262 7.04E-124 96.72 Fungi Cyphellophora OTU_A_2 331 KM246049 1.48E-140 100 Fungi Candida glaebosa clade OTU_A_28 2 KF906511 4.24E-121 95.7 Fungi Lodderomyces OTU_A_42 28 KM246049 4.18E-131 98.18 Fungi Candida glaebosa clade OTU_A_40 1 JF773599 1.48E-140 100 Fungi OTU_A_18 1 KM555210 7.04E-124 97.03 Fungi Trichosporon OTU_A_24 3 JN939448 6.90E-139 99.64 Fungi Trichosporon OTU_A_35 1 JN649352 1.48E-140 100 Fungi Phlebia OTU_A_44 3 KJ936991 1.15E-136 99.27 Fungi Tricholoma

146

OTU_A_48 1 KM267736 1.49E-135 98.91 Fungi Auricularia OTU_A_50 1 JQ768872 2.53E-123 96.36 Fungi Cryptococcus OTU_A_51 1 KF691613 1.49E-135 98.91 Fungi Dentocorticium OTU_A_53 1 EU232298 1.15E-136 99.27 Fungi Fibroporia OTU_A_38 1 AY745717 2.04E-94 89.96 Fungi Sporobolomyces OTU_A_46 1 KM555204 3.21E-137 99.27 Fungi Pseudozyma OTU_A_8 68 KM056329 1.48E-140 100 Fungi Mucor OTU_A_33 1 JN206525 1.49E-135 98.92 Fungi Backusella OTU_A_43 1 JN206497 9.05E-128 97.12 Fungi Mucor OTU_A_47 1 KM242293 6.99E-129 97.46 Fungi Mucor OTU_A_13 15 JQ311373 6.90E-139 99.64 Fungi environmental samples OTU_A_17 5 KF568245 1.49E-135 98.91 Fungi environmental samples

146 OTU_A_20 2 KF565450 1.48E-140 100 Fungi environmental samples OTU_A_41 1 KF567809 2.50E-133 98.55 Fungi environmental samples

OTU_A_52 1 KF568247 1.48E-140 100 Fungi environmental samples OTU_A_4 75 KF436392 5.37E-135 98.91 Fungi OTU_A_1 198 JQ000370 1.52E-120 95.64 Acari Anoetus OTU_A_5 39 JQ000359 1.94E-129 97.45 Acari Ovanoetus OTU_A_10 8 JQ000343 1.96E-124 96.39 Acari unclassified Brachychthoniidae OTU_A_34 1 JQ000370 3.32E-112 94.18 Acari Anoetus OTU_A_37 1 JQ000370 5.56E-110 93.8 Acari Anoetus OTU_A_3 36 EU392316 1.48E-140 100 Insecta Coccinella OTU_A_6 18 JF439579 6.90E-139 99.64 Insecta Isomyia OTU_A_15 1 EU392316 3.25E-127 97.45 Insecta Coccinella OTU_A_36 1 KJ860019 1.48E-140 100 Insecta Stenamma OTU_A_12 40 DQ145650 7.72E-59 83.65 Nematoda

147

OTU_A_25 1 DQ145650 7.83E-49 81.41 Nematoda OTU_A_14 4 KJ877269 6.36E-15 93.55 Nematoda Teratodiplogaster OTU_A_9 3 FJ654281 1.51E-125 96.73 Rhizaria Cercozoa OTU_A_7 34 AJ584697 4.81E-31 76.9 Amoebozoa Fuligo OTU_A_26 2 JQ519384 6.90E-139 99.64 Viridiplantae Sarracenia OTU_A_29 1 NR_076660 1.65E-65 83.7 Bacteria Pedobacter OTU_A_31 1 NR_076660 9.92E-63 83.15 Bacteria Pedobacter OTU_A_32 1 NR_076660 9.99E-58 82.08 Bacteria Pedobacter OTU_A_49 1 CP010050 1.52E-120 96.96 Bacteria Lactococcus C (Cooter’s Bog) OTU_C_38 1 JQ622099 9.11E-123 97.01 Fungi Ramichloridium OTU_C_39 1 AB986419 1.48E-140 100 Fungi Cladophialophora

147 OTU_C_40 1 HG421426 4.15E-136 99.27 Fungi Aureobasidium OTU_C_1 317 KM246049 1.48E-140 100 Fungi Candida glaebosa clade

OTU_C_29 1 EF463105 3.21E-137 100 Fungi Lachancea OTU_C_31 1 KM246049 5.45E-125 97.09 Fungi Candida glaebosa clade OTU_C_36 1 KM246049 1.17E-126 97.45 Fungi Candida glaebosa clade OTU_C_17 4 KP322771 6.90E-139 99.64 Fungi Ascomycota OTU_C_25 1 EU490092 9.05E-128 97.44 Fungi Ascomycota OTU_C_20 4 KP013050 3.26E-127 97.09 Fungi Polyporaceae OTU_C_28 2 AB712269 1.04E-27 76.43 Fungi Agaricomycetes OTU_C_46 1 KP013050 2.53E-123 96.38 Fungi Polyporaceae OTU_C_34 1 KM370119 6.90E-139 99.64 Fungi Malassezia OTU_C_8 69 JQ311233 7.62E-69 84.59 Fungi environmental samples OTU_C_15 1 KF567957 1.99E-114 94.55 Fungi environmental samples OTU_C_16 8 JX534900 4.27E-116 98.35 Fungi environmental samples 148

OTU_C_23 2 KF566541 5.49E-120 95.64 Fungi environmental samples OTU_C_33 1 KF568595 1.63E-75 86.72 Fungi environmental samples OTU_C_10 18 KM056329 2.48E-138 99.64 Fungi Mucor OTU_C_6 73 KF436392 5.37E-135 98.91 Fungi OTU_C_30 1 KF435295 4.18E-131 98.88 Fungi OTU_C_2 350 JQ000370 1.52E-120 95.64 Acari Anoetus OTU_C_3 147 JQ000359 1.95E-129 97.45 Acari Ovanoetus OTU_C_9 6 JQ000368 1.54E-115 94.18 Acari Sellea OTU_C_21 1 JQ000370 2.61E-103 92.36 Acari Anoetus OTU_C_22 24 JQ000370 2.57E-113 94.51 Acari Anoetus OTU_C_26 1 JQ000370 5.60E-105 92.73 Acari Anoetus OTU_C_44 1 JQ000370 5.64E-100 93.68 Acari Anoetus

148 OTU_C_4 36 EF012972 1.48E-140 100 Insecta Brachymyrmex OTU_C_12 9 EF013063 5.37E-135 98.91 Insecta Solenopsis

OTU_C_5 70 EU145710 1.25E-81 87.5 Insecta Uleiota OTU_C_43 2 FJ409577 1.78E-10 88.71 Bryozoa Flustrellidra OTU_C_48 3 JX545288 6.01E-55 83.61 Annelida Plutellus OTU_C_41 1 4W1Z_5 1.48E-140 100 Chordata Sus OTU_C_11 12 KC832958 6.94E-134 98.55 Alveolata Leptopharynx OTU_C_27 3 KC832958 4.21E-126 97.09 Alveolata Leptopharynx OTU_C_35 1 KC832958 4.24E-121 96 Alveolata Leptopharynx OTU_C_42 1 KC832958 5.89E-70 84.78 Alveolata Leptopharynx OTU_C_7 11 AJ584697 8.00E-34 77.42 Amoebozoa Fuligo OTU_C_37 20 AJ584697 1.73E-30 77.04 Amoebozoa Fuligo OTU_C_32 4 GU928192 8.29E-09 93.75 environmental samples OTU_C_14 1 NR_076660 2.15E-59 82.31 Bacteria Pedobacter 149

OTU_C_24 4 NR_076660 6.01E-55 81.59 Bacteria Pedobacter OTU_C_45 1 CP002545 1.72E-35 77.82 Bacteria Pseudopedobacter OTU_C_13 3 NO MATCH WITH THESE SEARCH PARAMETERS OTU_C_18 2 NO MATCH WITH THESE SEARCH PARAMETERS OTU_C_19 6 NO MATCH WITH THESE SEARCH PARAMETERS OTU_C_47 2 NO MATCH WITH THESE SEARCH PARAMETERS K (Kisatchie) OTU_K_9 5 KP306939 2.48E-138 99.64 Fungi Pestalotiopsis OTU_K_10 5 KP192855 1.48E-140 100 Fungi Nigrospora OTU_K_12 21 KP143685 6.90E-139 99.64 Fungi Cladosporium OTU_K_17 9 KP122800 1.48E-140 100 Fungi Fusarium fujikuroi species complex OTU_K_18 9 HG965074 1.48E-140 100 Fungi Sarocladium

149 OTU_K_24 1 AB808244 5.52E-115 94.93 Fungi Periconia OTU_K_25 3 JF499862 2.48E-138 99.64 Fungi Penidiella

OTU_K_33 1 JQ649347 7.00E-129 97.45 Fungi unclassified Hypocreales OTU_K_39 1 JQ889293 2.52E-128 97.46 Fungi Dinemasporium OTU_K_44 1 KM386435 2.50E-133 99.25 Fungi Aspergillus OTU_K_45 1 AF033439 5.37E-135 98.91 Fungi Penicillium OTU_K_50 2 JQ922194 3.28E-122 96.01 Fungi unclassified Hypocreales OTU_K_56 1 GQ153025 2.48E-138 99.64 Fungi unclassified OTU_K_67 2 KM272363 7.00E-129 97.45 Fungi Asteromyces OTU_K_71 1 JF449494 5.41E-130 98.51 Fungi Pezizomycotina OTU_K_72 1 KP401925 2.53E-123 96.39 Fungi Acremonium OTU_K_2 205 KM246049 2.48E-138 99.64 Fungi Candida glaebosa clade OTU_K_42 4 KM246049 1.50E-130 98.18 Fungi Candida glaebosa clade OTU_K_62 1 EU490168 3.21E-137 99.28 Fungi environmental samples 150

OTU_K_55 1 JF925333 2.48E-138 99.64 Fungi Peniophora OTU_K_15 12 KJ917974 1.48E-140 100 Fungi Hannaella OTU_K_58 1 KC798409 1.49E-135 98.92 Fungi Cryptococcus OTU_K_65 1 KM555190 1.53E-120 95.65 Fungi Cryptococcus OTU_K_66 1 KM246145 1.50E-130 98.17 Fungi Cryptococcus OTU_K_36 1 KJ847249 1.48E-140 100 Fungi Malassezia OTU_K_41 1 KC439362 3.21E-137 99.27 Fungi Pseudozyma OTU_K_35 1 AB520301 4.18E-131 98.18 Fungi environmental samples OTU_K_37 1 KF567809 9.05E-128 97.78 Fungi environmental samples OTU_K_43 1 KC557503 1.49E-135 98.91 Fungi environmental samples OTU_K_46 3 KF567183 1.50E-130 97.51 Fungi environmental samples OTU_K_52 1 KC965452 3.32E-112 94.49 Fungi environmental samples

150 OTU_K_63 1 KF567183 9.18E-118 96.28 Fungi environmental samples OTU_K_68 1 KJ173582 1.93E-134 99.63 Fungi environmental samples

OTU_K_74 1 KC965452 3.28E-122 96.34 Fungi environmental samples OTU_K_60 3 JN206504 1.48E-140 100 Fungi Mucor OTU_K_64 1 JN206518 3.26E-127 97.09 Fungi Mucor OTU_K_7 36 KF436392 2.50E-133 98.55 Fungi OTU_K_61 1 KR080791 2.48E-138 99.64 Fungi OTU_K_1 160 JQ000370 1.53E-120 95.64 Acari Anoetus OTU_K_4 74 JQ000359 1.95E-129 97.45 Acari Ovanoetus OTU_K_14 3 JQ000368 3.32E-112 93.45 Acari Sellea OTU_K_38 1 JQ000370 2.59E-108 93.45 Acari Anoetus OTU_K_13 34 AF134363 9.38E-103 91.76 Araneae Coelotes OTU_K_19 19 JX145769 3.35E-107 97.05 Araneae Marpissinae OTU_K_23 5 JX145745 7.05E-124 99.6 Araneae Thiodina 151

OTU_K_28 2 JQ312076 2.03E-99 91.82 Araneae Philodromus OTU_K_32 7 AY210461 2.02E-104 92.09 Araneae Mecaphesa OTU_K_59 1 JX145745 3.30E-117 97.63 Araneae Thiodina OTU_K_73 5 JX145769 7.30E-99 94.94 Araneae Marpissinae OTU_K_75 1 JQ312076 1.60E-85 88.73 Araneae Philodromus OTU_K_8 16 EU145710 1.25E-81 87.5 Insecta Uleiota OTU_K_21 19 KJ845077 7.40E-89 88.77 Insecta Glypholoma OTU_K_51 2 KC177760 1.49E-135 98.91 Insecta Minettia OTU_K_5 51 EF013034 1.48E-140 100 Insecta Nylanderia OTU_K_6 73 AY703610 1.48E-140 100 Insecta Pseudomyrmex OTU_K_11 3 DQ353637 7.00E-129 98.14 Insecta Crematogaster OTU_K_20 2 GQ374781 2.52E-128 97.46 Insecta Coccophagus

151 OTU_K_22 2 DQ353637 1.18E-121 96 Insecta Crematogaster OTU_K_26 1 EF013063 7.00E-129 97.45 Insecta Solenopsis

OTU_K_27 1 EF012984 1.97E-119 96.28 Insecta Crematogaster OTU_K_29 4 KJ860020 4.30E-111 93.5 Insecta Stenamma OTU_K_34 1 AY703589 2.52E-128 97.45 Insecta Pseudomyrmex OTU_K_40 1 AY703610 1.51E-125 96.74 Insecta Pseudomyrmex OTU_K_47 2 EF012984 3.28E-122 95.74 Insecta Crematogaster OTU_K_48 2 GQ374778 3.21E-137 99.27 Insecta Eurytoma OTU_K_54 1 GQ374733 7.00E-129 98.14 Insecta Ceraphron OTU_K_57 7 FJ939794 3.21E-137 99.27 Insecta Forelius OTU_K_16 2 EF622908 8.17E-19 79.19 Insecta Austrocercella OTU_K_31 1 FJ411426 6.95E-134 98.55 Collembola Bourletiella OTU_K_3 82 DQ145650 7.72E-59 83.65 Nematoda OTU_K_30 126 DQ145650 6.01E-55 82.89 Nematoda

152

OTU_K_70 2 DQ145650 2.80E-53 82.4 Nematoda OTU_K_53 4 AJ584697 6.19E-35 77.7 Amoebozoa Fuligo OTU_K_69 1 KC183332 3.37E-102 92.57 Euglenozoa Trypanosomatidae OTU_K_49 1 NO BLAST MATCH WITH THESE SEARCH PARAMETERS L (Lake Ramsey) OTU_L_14 5 KR094461 2.62E-138 99.64 Fungi Epicoccum OTU_L_15 4 KP963944 1.57E-140 100 Fungi Cladosporium OTU_L_25 6 HG421426 1.22E-136 99.27 Fungi Aureobasidium OTU_L_68 2 KT236303 1.56E-140 100 Fungi Alternaria OTU_L_75 1 HM595588 2.66E-128 97.45 Fungi unclassified Myriangiaceae OTU_L_16 20 KM099503 7.30E-139 99.64 Fungi Penicillium OTU_L_58 1 AF033439 3.41E-132 98.53 Fungi Penicillium

152 OTU_L_80 1 KM012005 1.23E-131 98.18 Fungi Talaromyces OTU_L_17 28 KR094457 2.62E-138 99.64 Fungi Fusarium incarnatum-equiseti species complex

OTU_L_42 4 KR906694 1.59E-130 98.18 Fungi Fusarium fujikuroi species complex OTU_L_50 1 KP325443 2.62E-138 99.64 Fungi Atrotorquata OTU_L_73 1 FJ588240 7.28E-139 99.64 Fungi Colletotrichum OTU_L_82 1 KJ130983 7.44E-124 96.38 Fungi Simplicillium OTU_L_67 2 DQ248313 5.71E-130 97.82 Fungi Symbiotaphrina OTU_L_1 662 NG_042771 2.62E-138 99.64 Fungi Candida glaebosa clade OTU_L_24 1 KM246049 1.68E-90 89.49 Fungi Candida glaebosa clade OTU_L_37 9 KM246049 3.44E-127 97.44 Fungi Candida glaebosa clade OTU_L_39 1 KF826535 3.44E-127 97.09 Fungi Candida OTU_L_45 3 KM246049 1.59E-130 97.52 Fungi Candida glaebosa clade OTU_L_52 20 NG_042771 1.25E-121 96.38 Fungi Candida glaebosa clade OTU_L_53 1 KM246049 1.32E-81 87.73 Fungi Candida glaebosa clade 153

OTU_L_56 176 KM246049 1.59E-130 98.18 Fungi Candida glaebosa clade OTU_L_63 2 KM246049 1.24E-126 97.45 Fungi Candida glaebosa clade OTU_L_65 1 KM246049 2.05E-129 97.82 Fungi Candida glaebosa clade OTU_L_13 7 AF363650 3.39E-137 99.27 Fungi Kockovaella OTU_L_32 2 KP825398 1.56E-140 100 Fungi Malassezia OTU_L_33 2 KP780476 1.23E-131 98.18 Fungi Cryptococcus OTU_L_43 1 AJ406447 2.04E-134 98.9 Fungi Tubulicrinis OTU_L_46 1 EF363144 1.58E-135 98.91 Fungi Cryptococcus OTU_L_48 3 KP297997 7.28E-139 99.64 Fungi Anthracocystis OTU_L_72 1 AB638334 2.75E-103 92 Fungi Sporobolomyces OTU_L_78 2 GU055678 5.08E-31 77.26 Fungi Blastocladiomycota OTU_L_10 19 KF566189 1.59E-130 97.82 Fungi environmental samples

153 OTU_L_18 6 KF566557 3.39E-137 99.27 Fungi environmental samples OTU_L_20 4 KF567167 6.76E-10 95.74 Fungi environmental samples

OTU_L_21 12 KF568245 3.39E-137 99.28 Fungi environmental samples OTU_L_22 24 JQ311373 2.66E-128 97.45 Fungi environmental samples OTU_L_23 2 JX898659 8.69E-14 89.71 Fungi environmental samples OTU_L_28 4 KF567809 9.55E-128 97.45 Fungi environmental samples OTU_L_29 5 JQ311373 5.67E-135 98.91 Fungi environmental samples OTU_L_41 2 EU861693 1.29E-96 91.14 Fungi environmental samples OTU_L_54 1 KF567865 9.83E-108 92.83 Fungi environmental samples OTU_L_70 1 JN890104 7.38E-129 97.45 Fungi environmental samples OTU_L_71 1 KF566230 8.63E-19 95.38 Fungi environmental samples OTU_L_76 1 KF750514 1.56E-140 100 Fungi environmental samples OTU_L_79 1 DQ365457 1.66E-100 91.3 Fungi environmental samples OTU_L_8 59 KM056329 1.56E-140 100 Fungi Mucor 154

OTU_L_11 13 KM242293 1.56E-140 100 Fungi Mucor OTU_L_47 3 KP311363 3.41E-132 98.18 Fungi Mortierella OTU_L_64 1 JN206518 2.67E-123 96.36 Fungi Mucor OTU_L_4 281 KF436392 5.68E-135 98.91 Fungi OTU_L_26 5 KR080807 1.56E-140 100 Fungi OTU_L_34 2 KF436392 2.10E-114 95.51 Fungi OTU_L_40 1 KF436392 7.44E-124 96.73 Fungi OTU_L_44 1 KF435681 1.56E-140 100 Fungi OTU_L_59 1 KF436218 5.67E-135 98.91 Fungi OTU_L_2 315 JQ000359 2.06E-129 97.45 Acari Ovanoetus OTU_L_3 194 JQ000370 1.61E-120 95.64 Acari Anoetus OTU_L_31 1 JQ000359 7.56E-114 94.55 Acari Ovanoetus

154 OTU_L_36 1 JQ000370 2.73E-108 93.09 Acari Anoetus OTU_L_61 2 JQ000359 7.54E-114 94.18 Acari Ovanoetus

OTU_L_77 1 JQ000370 1.64E-105 92.75 Acari Anoetus OTU_L_12 4 HQ832608 7.28E-139 99.64 Insecta Monochamus OTU_L_38 5 KJ845168 9.69E-118 96.24 Insecta Adranes OTU_L_74 1 KC177715 1.25E-116 94.91 Insecta Neurigona OTU_L_27 4 DQ768533 2.64E-133 98.9 Insecta Colletes OTU_L_35 1 EF012972 5.67E-135 98.91 Insecta Brachymyrmex OTU_L_49 1 AF483385 1.59E-130 97.83 Collembola Orchesella OTU_L_55 3 AF483396 7.54E-114 94.18 Collembola Sinella OTU_L_19 5 KC832958 3.41E-132 98.19 Alveolata Leptopharynx OTU_L_7 38 AJ584697 3.04E-33 77.26 Amoebozoa Fuligo OTU_L_9 23 DQ340388 1.31E-86 88.49 Amoebozoa Polysphondylium OTU_L_30 3 KM116463 6.14E-80 86.96 Viridiplantae Chloroidium 155

OTU_L_62 1 DQ646479 1.56E-140 100 Viridiplantae Sarracenia OTU_L_51 1 NR_076660 1.76E-55 81.91 Bacteria Pedobacter OTU_L_69 2 NR_076660 1.04E-67 84.17 Bacteria Pedobacter OTU_L_81 1 NR_076660 4.87E-61 82.8 Bacteria Pedobacter OTU_L_5 58 NO BLAST MATCH WITH THESE SEARCH PARAMETERS OTU_L_6 26 NO BLAST MATCH WITH THESE SEARCH PARAMETERS OTU_L_57 2 NO BLAST MATCH WITH THESE SEARCH PARAMETERS OTU_L_60 1 NO BLAST MATCH WITH THESE SEARCH PARAMETERS OTU_L_66 1 NO BLAST MATCH WITH THESE SEARCH PARAMETERS T (Talisheek) OTU_T_7 5 KP143685 1.48E-140 100 Fungi Cladosporium OTU_T_29 1 KJ869217 5.52E-115 95.15 Fungi Toxicocladosporium

155 OTU_T_6 63 JX863917 6.90E-139 99.64 Fungi Aspergillus OTU_T_8 4 KF614894 6.90E-139 99.64 Fungi unclassified Chaetothyriales

OTU_T_9 10 KM099503 1.48E-140 100 Fungi Penicillium OTU_T_15 3 KM012005 1.48E-140 100 Fungi Talaromyces OTU_T_22 4 EF413604 5.45E-125 96.73 Fungi Capronia OTU_T_64 2 JX863917 4.24E-121 96.36 Fungi Aspergillus OTU_T_23 4 JN938867 2.53E-123 97.03 Fungi Stachybotrys OTU_T_35 8 GU055527 1.50E-130 98.18 Fungi Hypocreales OTU_T_38 1 KP122800 2.48E-138 99.64 Fungi Fusarium fujikuroi species complex OTU_T_46 1 KP306939 2.48E-138 99.64 Fungi Pestalotiopsis OTU_T_59 1 AB566331 3.21E-137 99.27 Fungi Phaeoacremonium OTU_T_58 1 DQ248313 4.18E-131 97.83 Fungi Symbiotaphrina OTU_T_1 338 KM246049 2.48E-138 99.64 Fungi Candida glaebosa clade OTU_T_18 1 KM246049 5.41E-130 98.18 Fungi Candida glaebosa clade 156

OTU_T_20 22 KM246049 1.95E-129 98.16 Fungi Candida glaebosa clade OTU_T_39 1 KF830184 1.96E-124 96.39 Fungi Candida OTU_T_55 1 KM246049 1.51E-125 97.1 Fungi Candida glaebosa clade OTU_T_32 9 HG421418 2.50E-133 98.55 Fungi Metschnikowia OTU_T_30 2 AY293189 2.48E-138 99.64 Fungi Hypsizygus OTU_T_33 1 KP012950 1.17E-126 97.09 Fungi Leiotrametes OTU_T_56 1 AF347104 1.54E-115 94.55 Fungi Trichaptum OTU_T_60 1 AF518636 6.90E-139 99.64 Fungi Fuscoporia OTU_T_31 1 JN043592 9.18E-118 94.95 Fungi Biatoropsis OTU_T_36 6 AY562140 2.50E-133 98.55 Fungi Cryptococcus OTU_T_54 1 AY562140 1.96E-124 96.73 Fungi Cryptococcus OTU_T_27 1 KM370119 7.00E-129 97.45 Fungi Malassezia

156 OTU_T_28 1 JN367325 1.48E-140 100 Fungi Ustanciosporium OTU_T_44 2 KM555204 1.48E-140 100 Fungi Pseudozyma

OTU_T_53 1 JF816224 2.52E-128 98.86 Fungi Conidiobolus OTU_T_17 2 KC557252 2.70E-78 86.69 Fungi environmental samples OTU_T_24 14 KF566825 2.50E-133 99.25 Fungi environmental samples OTU_T_25 8 KF567899 2.52E-128 97.45 Fungi environmental samples OTU_T_41 5 KF750375 2.63E-98 91.21 Fungi environmental samples OTU_T_43 2 KC558160 2.50E-133 98.55 Fungi environmental samples OTU_T_47 1 JQ311682 1.55E-110 93.5 Fungi environmental samples OTU_T_57 1 FJ456973 2.13E-64 83.75 Fungi environmental samples OTU_T_62 1 AB520592 1.15E-136 99.27 Fungi environmental samples OTU_T_63 1 KF568247 1.48E-140 100 Fungi environmental samples OTU_T_10 7 JN206518 1.96E-124 96.39 Fungi Mucor OTU_T_16 22 JN206600 1.48E-140 100 Fungi Cunninghamella 157

OTU_T_19 10 KF733344 9.05E-128 98.49 Fungi Mucor OTU_T_49 21 KM056329 5.37E-135 98.91 Fungi Mucor OTU_T_4 119 KF436392 5.37E-135 98.91 Fungi OTU_T_40 1 KF435295 1.48E-140 100 Fungi OTU_T_2 77 JQ000370 1.53E-120 95.64 Acari Anoetus OTU_T_5 72 JQ000359 1.95E-129 97.45 Acari Ovanoetus OTU_T_42 10 JQ000368 7.15E-114 93.84 Acari Sellea OTU_T_61 5 JQ000370 1.20E-111 93.84 Acari Anoetus OTU_T_11 15 KJ860019 5.33E-140 100 Insecta Stenamma OTU_T_50 2 GU244889 3.26E-127 97.43 Insecta Neolarra OTU_T_13 2 DQ145650 7.72E-59 83.65 Nematoda OTU_T_45 1 JX839924 1.48E-140 100 Mollusca Neohelix

157 OTU_T_3 93 AJ584697 2.88E-33 77.26 Amoebozoa Fuligo OTU_T_26 3 AJ584697 2.90E-28 76.56 Amoebozoa Fuligo

OTU_T_34 3 AJ584697 2.26E-24 81.7 Amoebozoa Fuligo OTU_T_66 1 DQ340388 3.44E-87 88.49 Amoebozoa Polysphondylium OTU_T_51 1 KM116463 2.68E-83 87.68 Viridiplantae Chloroidium OTU_T_52 1 JQ519384 6.90E-139 99.64 Viridiplantae Sarracenia OTU_T_21 1 CP011650 3.21E-137 99.28 Bacteria Enterobacter cloacae complex OTU_T_37 3 CP003488 4.78E-36 87.41 Bacteria Providencia OTU_T_65 1 NR_076660 2.16E-54 81.63 Bacteria Pedobacter OTU_T_12 5 NO BLAST MATCH WITH THESE SEARCH PARAMETERS OTU_T_14 7 NO BLAST MATCH WITH THESE SEARCH PARAMETERS OTU_T_48 2 NO BLAST MATCH WITH THESE SEARCH PARAMETERS

158

A.4. BLAST information for OTUs from global data set, including number of sequences (N) in the OTU and GenBank (GB) indentifier number. Taxonomic information includes higher level grouping, plus the lowest taxonomic information available from the corresponding BLAST hit.

Taxa OTU N GB Identifier E-value % Identity Taxonomy Fungi1 OTU_12 52 KP143685 1.48E-140 100 Fungi Cladosporium Fungi2 OTU_17 51 KP122800 1.48E-140 100 Fungi Fusarium fujikuroi species complex Fungi3 OTU_43 22 KM246064 2.48E-138 99.64 Fungi Curvularia Fungi4 OTU_1 2507 KM246049 2.48E-138 99.64 Fungi Candida glaebosa clade 158 Fungi5 OTU_48 97 KM246049 8.99E-133 98.55 Fungi Candida glaebosa clade

Fungi6 OTU_67 168 KM246049 5.41E-130 98.18 Fungi Candida glaebosa clade Fungi7 OTU_121 84 KM246049 5.41E-130 98.52 Fungi Candida glaebosa clade Fungi8 OTU_164 57 KF830174 1.55E-110 93.48 Fungi Candida Fungi9 OTU_188 30 KM246049 1.50E-130 98.18 Fungi Candida glaebosa clade Fungi10 OTU_300 54 KM246049 5.41E-130 98.18 Fungi Candida glaebosa clade Fungi11 OTU_10 189 KM056329 1.48E-140 100 Fungi Mucor Fungi12 OTU_147 14 KF567809 1.15E-136 99.27 Fungi environmental samples Fungi13 OTU_241 40 KF750375 5.60E-105 92.36 Fungi environmental samples Fungi14 OTU_4 766 KF436392 5.37E-135 98.91 Fungi Fungi15 OTU_25 15 KP192855 1.48E-140 100 Fungi Nigrospora Amoebozoa1 OTU_5 227 AJ584697 2.88E-33 77.26 Amoebozoa Fuligo Alveolata1 OTU_35 18 KC832958 6.95E-134 98.55 Alveolata Leptopharynx Nematoda1 OTU_7 79 DQ145650 7.73E-59 83.65 Nematoda

160

Nematoda2 OTU_159 154 DQ145650 6.01E-55 82.89 Nematoda Nematoda3 OTU_291 21 DQ145650 1.29E-56 83.21 Nematoda Insect1 OTU_8 61 EF012972 1.48E-140 100 Insecta Brachymyrmex Insect2 OTU_13 37 EF013063 1.48E-140 100 Insecta Solenopsis Insect3 OTU_263 41 EF013034 1.15E-136 99.27 Insecta Nylanderia Mite1 OTU_3 828 JQ000359 1.95E-129 97.45 Acari Ovanoetus Mite2 OTU_76 30 JQ000359 3.28E-122 96.01 Acari Ovanoetus Mite3 OTU_2 1071 JQ000370 1.53E-120 95.64 Acari Anoetus Mite4 OTU_105 56 JQ000370 3.33E-112 94.44 Acari Anoetus Mite5 OTU_153 34 JQ000370 1.54E-115 95.2 Acari Anoetus Mite6 OTU_202 50 JQ000370 5.52E-115 95.49 Acari Anoetus Mite7 OTU_323 45 JQ000370 1.20E-111 93.84 Acari Anoetus 159 Unknown OTU_9 66 NO BLAST MATCH WITH THESE SEARCH PARAMETERS

161

Appendix B: Chapter 3 Supplemental Material

B.1. Spatial principle components analysis (sPCA) of the six arthropods. Colorplot is visualized on the landscape, with space in the middle representing Mississippi River (see

Fig. 6 for reference). X axis is longitude; y axis is latitude. Colors represent genetic similarity, highlighting population structure in all species except for the two spiders.

A) E. semicrocea B) M. biscutatus   ●    ● ●  ● ● ● ● ●  ●  ● ●● ●● ● ●●  ●  ●   ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï

C) S. sarraceniae D) F. celarata   ●     ● ● ● ● ● ●  ●  ● ● ●● ●     ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï

E) M. formosipes F) P. viridans   ● ●    ●  ● ● ● ● ● ●  ●  ● ● ● ● ●●   ●   ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï ï

160

B.2. Maximum likelihood COI mtDNA gene tree for Sarcophaga sarraceniae. Samples are color coded depending on the geographic region in which they are found (see Fig. 6).

Appendix B.10 contains information regarding individual samples. All nodes with bootstrap support values ≥ 50 are shown; scale bar is in substitutions per site.

BL14 CB7 Sarcophaga sarraceniae KS36 CB15 CB19 RD7 CB3 CB31 KS22 CB14 RD19 BL31 KS4 BL28 KS10 BL16 PT45 BL18 CB18 RD18 BL7 PT16 CB11 BL35 CB16 BL26 BL6 PT12 BL27 BL30 100 KS15 BL3 BL1 KS5 BL13 PT11 BL5 KS16 RD2 RD20 CB6 BL32 BL2 KS20 RD8 CB5 BL25 RD5 BL19 BL29 CB2 RD3 BL46 BL39 BL4 BL9 KS2 BL33 RD4 BL34 CB1 RD6 64 KS13 KS41 KS17 KS1 KS14 64 KS18 KS3 KS35 KS19 DS39 KS40 TL17 LR2 TL15 DS16 TL13 DS32 TL22 DS47 TL33 DS25 TB43 DS15 DS19 DS6 LR5 DS40 DS21 LR3 DS13 TB40 100 TB34 TL18 TB1 LR17 DS43 TB38 LR1 TB46 TB37 AS185 LR24 TL14 TB33 TB17 DS62 DS44 DS45 DS3 DS5 LR7 DS23 TL23 TL8 45 DS1 TB30 LR9 LR8 TL4 TL19 DS35 DS20 TL1 0.0040 LR4 DS42 TB3 DS17 DS30 TL30 DS46 TL9 TL36 TB35 22 TB7 TB47 LR18 DS2 TL6 LR6 TL35 TB39 AS184 DS22 DS14 DS4 DS31 TB8 DS12 TL16 TL5 TB2 DS18 DS41 DS11 TB41

161

B.3. Maximum likelihood COI mtDNA gene tree for Fletcherimyia celarata. Samples are color coded depending on the geographic region in which they are found (see Fig. 6).

Appendix B.10 contains information regarding individual samples. All nodes with bootstrap support values ≥ 50 are shown; scale bar is in substitutions per site.

Fletcherimyia celarata CB10 CB17 99 CB13 CB4 CB12 TB27 TL25 TB4 TB24 DS34 99 TL31 TB22 TL29 AS183 TL32 TL7 TL21 TL11 TB31 62 DS33 TB5 TL10 TB48 TL28 0.0010 TL27 DS24 TL20 TB26 TL26 TB6 TB42 TL3 TB29

162

B.4. Maximum likelihood COI mtDNA gene tree for Macroseius biscutatus. Samples are color coded depending on the geographic region in which they are found (see Fig. 6).

Appendix B.10 contains information regarding individual samples. All nodes with bootstrap support values ≥ 50 are shown; scale bar is in substitutions per site.

Macroseius biscutatus PT56b

RD32a 100 PT56a

64 RD32c

CB102b 43 CB102c

KS34d 50 KS34b

59 KS34a

KS34c

TB87b

TB87a 100 FC27a

AS223b 45 AS223c

0.0300 TL91b

TL91a 52 DS113a

DS113b

LR67a

62 LR67b

163

B.5. Maximum likelihood COI mtDNA gene tree for Exyra semicrocea. Samples are color coded depending on the geographic region in which they are found (see Fig. 6).

Appendix B.10 contains information regarding individual samples. All nodes with bootstrap support values ≥ 50 are shown; scale bar is in substitutions per site.

Exyra semicrocea CB103 CB95 100 RD33 CB96 59 RD29 CB85 55 CB100 CB8 BL78 65 BL84 65 BL36 PT50 86 SD1 PT46 PT6 AS211 LR57 TB79 TB75 100 FC12 TL74 TL80 TL85 LR63 AS218 FC20 DS36 LR65 FC13 62 AS212 DS107 0.0050 DS104 62 LR56 FC16 DS109 65 FC19 TL86

164

B.6. Maximum likelihood COI mtDNA gene tree for Peucetia viridans. Samples are color coded depending on the geographic region in which they are found (see Fig. 6).

Appendix B.10 contains information regarding individual samples. All nodes with bootstrap support values ≥ 50 are shown; scale bar is in substitutions per site.

Peucetia viridans

75 PT36 87 KS66 94 DS54 DS66 AS189 KS23 FC4 AS59 TB49 CB71 75 CB72 TB13 PT23 53 TL60 82 BL64 63 LR34 DS97 CB53 KS42 RD12 RD11 PT18 TL46 LR33 BL40 CB20 RD9 PT17 KS43 CB33 CB52 CB73 KS65 DS38 DS37 TL39 63 AS188 LR45 AS192 TB52 RD23 BL38 RD21 DS53 FC5 0.0030 KS52 PT39 BL41 RD22 KS53 FC6 CB21 TL40 BL37 64 TB12 KS54

165

B.7. Maximum likelihood COI mtDNA gene tree for Misumenoides formosipes. Samples are color coded depending on the geographic region in which they are found (see Fig. 6).

Appendix B.10 contains information regarding individual samples. All nodes with bootstrap support values ≥ 50 are shown; scale bar is in substitutions per site.

Misumenoides formosipes KS30 98 CB58 KS26 KS47 CB23 AS1 AS32 98 KS24 AS31 AS2 RD14 54 KS25 CB22 AS195 KS46 KS67 KS68 RD13 BL53 KS55 KS31 AS3 KS27 58 KS44 KS28 KS45 CB57 KS56 64 PT28 0.0030 RD24 CB35 KS29 56 BL65 DS70

166

B.8. Distribution of average PCF values when pairwise comparisons are made between species simulated from the same tree. PCF values in the 90% HPD are shaded in black, and range from 0.71 to 0.99; 95% HPD is shaded in gray and black, ranging from 0.62 to

0.99. Maximum average PCF value is 0.9973.

4

3

2 Density

1

0

0.5 0.6 0.7 0.8 0.9 1.0 PCF Value

167

B.9. Maximum likelihood gene tree (rps16–trnK) for Sarracenia alata. Samples are color coded depending on the geographic region in which they are found (see Fig. 6). All nodes with bootstrap support values ≥ 50 are shown; scale bar is in substitutions per site.

A2 T248 A1 T49 Sarracenia alata A253 A43 A254 A59 A42 64 A257 A41 T40 F396 T160 T158 T39 A44 A330 T48 A332 A3 F404 L63 L111 L6 L11 D382 D378 L5 L7 L126 D384 F399 L120 L145 64 L10 L8 D381 L9 D379 L119 D383 D385 F397 L146 L12 F398 F402 C34 C183 K199 CO55 K304 P350 C182 K52 S373 P347 C47 B357 P348 K192 B361 63 K46 K57 K301 K51 B359 C185 S375 K200 K300 P355 0.0006 P346 B356 C189 B358 C184 C37

168

B.10. Arthropod sampling information for COI mtDNA and corresponding GenBank accession numbers for each individual. Locale letter corresponds to the sample sites in

Figure 6.

Species Sample code Locale COI GenBank Sarcophaga sarraceniae AS184 A KU975801 Sarcophaga sarraceniae AS185 A KU975802 Sarcophaga sarraceniae LR1 L KU975803 Sarcophaga sarraceniae LR2 L KU975804 Sarcophaga sarraceniae LR3 L KU975805 Sarcophaga sarraceniae LR4 L KU975806 Sarcophaga sarraceniae LR5 L KU975807 Sarcophaga sarraceniae LR6 L KU975808 Sarcophaga sarraceniae LR7 L KU975809 Sarcophaga sarraceniae LR8 L KU975810 Sarcophaga sarraceniae LR9 L KU975811 Sarcophaga sarraceniae LR17 L KU975812 Sarcophaga sarraceniae LR18 L KU975813 Sarcophaga sarraceniae LR24 L KU975814 Sarcophaga sarraceniae DS1 D KU975815 Sarcophaga sarraceniae DS2 D KU975816 Sarcophaga sarraceniae DS3 D KU975817 Sarcophaga sarraceniae DS4 D KU975818 Sarcophaga sarraceniae DS5 D KU975819 Sarcophaga sarraceniae DS6 D KU975820 Sarcophaga sarraceniae DS11 D KU975821 Sarcophaga sarraceniae DS12 D KU975822 Sarcophaga sarraceniae DS13 D KU975823 Sarcophaga sarraceniae DS14 D KU975824 Sarcophaga sarraceniae DS15 D KU975825 Sarcophaga sarraceniae DS16 D KU975826 Sarcophaga sarraceniae DS17 D KU975827 Sarcophaga sarraceniae DS18 D KU975828 Sarcophaga sarraceniae DS19 D KU975829 Sarcophaga sarraceniae DS20 D KU975830 Sarcophaga sarraceniae DS21 D KU975831 Sarcophaga sarraceniae DS22 D KU975832 Sarcophaga sarraceniae DS23 D KU975833 Sarcophaga sarraceniae DS25 D KU975834 169

Sarcophaga sarraceniae DS30 D KU975835 Sarcophaga sarraceniae DS31 D KU975836 Sarcophaga sarraceniae DS32 D KU975837 Sarcophaga sarraceniae DS35 D KU975838 Sarcophaga sarraceniae DS39 D KU975839 Sarcophaga sarraceniae DS40 D KU975840 Sarcophaga sarraceniae DS41 D KU975841 Sarcophaga sarraceniae DS42 D KU975842 Sarcophaga sarraceniae DS43 D KU975843 Sarcophaga sarraceniae DS44 D KU975844 Sarcophaga sarraceniae DS45 D KU975845 Sarcophaga sarraceniae DS46 D KU975846 Sarcophaga sarraceniae DS47 D KU975847 Sarcophaga sarraceniae DS62 D KU975848 Sarcophaga sarraceniae TL1 T KU975849 Sarcophaga sarraceniae TL4 T KU975850 Sarcophaga sarraceniae TL5 T KU975851 Sarcophaga sarraceniae TL6 T KU975852 Sarcophaga sarraceniae TL8 T KU975853 Sarcophaga sarraceniae TL9 T KU975854 Sarcophaga sarraceniae TL13 T KU975855 Sarcophaga sarraceniae TL14 T KU975856 Sarcophaga sarraceniae TL15 T KU975857 Sarcophaga sarraceniae TL16 T KU975858 Sarcophaga sarraceniae TL17 T KU975859 Sarcophaga sarraceniae TL18 T KU975860 Sarcophaga sarraceniae TL19 T KU975861 Sarcophaga sarraceniae TL22 T KU975862 Sarcophaga sarraceniae TL23 T KU975863 Sarcophaga sarraceniae TL30 T KU975864 Sarcophaga sarraceniae TL33 T KU975865 Sarcophaga sarraceniae TL35 T KU975866 Sarcophaga sarraceniae TL36 T KU975867 Sarcophaga sarraceniae TB1 Tb KU975868 Sarcophaga sarraceniae TB2 Tb KU975869 Sarcophaga sarraceniae TB3 Tb KU975870 Sarcophaga sarraceniae TB7 Tb KU975871 Sarcophaga sarraceniae TB8 Tb KU975872 Sarcophaga sarraceniae TB17 Tb KU975873 Sarcophaga sarraceniae TB30 Tb KU975874

170

Sarcophaga sarraceniae TB33 Tb KU975875 Sarcophaga sarraceniae TB34 Tb KU975876 Sarcophaga sarraceniae TB35 Tb KU975877 Sarcophaga sarraceniae TB37 Tb KU975878 Sarcophaga sarraceniae TB38 Tb KU975879 Sarcophaga sarraceniae TB39 Tb KU975880 Sarcophaga sarraceniae TB40 Tb KU975881 Sarcophaga sarraceniae TB41 Tb KU975882 Sarcophaga sarraceniae TB43 Tb KU975883 Sarcophaga sarraceniae TB46 Tb KU975884 Sarcophaga sarraceniae TB47 Tb KU975885 Sarcophaga sarraceniae BL1 B KU975886 Sarcophaga sarraceniae BL2 B KU975887 Sarcophaga sarraceniae BL3 B KU975888 Sarcophaga sarraceniae BL4 B KU975889 Sarcophaga sarraceniae BL5 B KU975890 Sarcophaga sarraceniae BL6 B KU975891 Sarcophaga sarraceniae BL7 B KU975892 Sarcophaga sarraceniae BL9 B KU975893 Sarcophaga sarraceniae BL13 B KU975894 Sarcophaga sarraceniae BL14 B KU975895 Sarcophaga sarraceniae BL16 B KU975896 Sarcophaga sarraceniae BL18 B KU975897 Sarcophaga sarraceniae BL19 B KU975898 Sarcophaga sarraceniae BL25 B KU975899 Sarcophaga sarraceniae BL26 B KU975900 Sarcophaga sarraceniae BL27 B KU975901 Sarcophaga sarraceniae BL28 B KU975902 Sarcophaga sarraceniae BL29 B KU975903 Sarcophaga sarraceniae BL30 B KU975904 Sarcophaga sarraceniae BL31 B KU975905 Sarcophaga sarraceniae BL32 B KU975906 Sarcophaga sarraceniae BL33 B KU975907 Sarcophaga sarraceniae BL34 B KU975908 Sarcophaga sarraceniae BL35 B KU975909 Sarcophaga sarraceniae BL39 B KU975910 Sarcophaga sarraceniae BL46 B KU975911 Sarcophaga sarraceniae PT11 P KU975912 Sarcophaga sarraceniae PT12 P KU975913 Sarcophaga sarraceniae PT16 P KU975914

171

Sarcophaga sarraceniae PT45 P KU975915 Sarcophaga sarraceniae CB1 C KU975916 Sarcophaga sarraceniae CB2 C KU975917 Sarcophaga sarraceniae CB3 C KU975918 Sarcophaga sarraceniae CB5 C KU975919 Sarcophaga sarraceniae CB6 C KU975920 Sarcophaga sarraceniae CB7 C KU975921 Sarcophaga sarraceniae CB11 C KU975922 Sarcophaga sarraceniae CB14 C KU975923 Sarcophaga sarraceniae CB15 C KU975924 Sarcophaga sarraceniae CB16 C KU975925 Sarcophaga sarraceniae CB18 C KU975926 Sarcophaga sarraceniae CB19 C KU975927 Sarcophaga sarraceniae CB31 C KU975928 Sarcophaga sarraceniae KS1 K KU975929 Sarcophaga sarraceniae KS2 K KU975930 Sarcophaga sarraceniae KS3 K KU975931 Sarcophaga sarraceniae KS4 K KU975932 Sarcophaga sarraceniae KS5 K KU975933 Sarcophaga sarraceniae KS10 K KU975934 Sarcophaga sarraceniae KS13 K KU975935 Sarcophaga sarraceniae KS14 K KU975936 Sarcophaga sarraceniae KS15 K KU975937 Sarcophaga sarraceniae KS16 K KU975938 Sarcophaga sarraceniae KS17 K KU975939 Sarcophaga sarraceniae KS18 K KU975940 Sarcophaga sarraceniae KS19 K KU975941 Sarcophaga sarraceniae KS20 K KU975942 Sarcophaga sarraceniae KS22 K KU975943 Sarcophaga sarraceniae KS35 K KU975944 Sarcophaga sarraceniae KS36 K KU975945 Sarcophaga sarraceniae KS40 K KU975946 Sarcophaga sarraceniae KS41 K KU975947 Sarcophaga sarraceniae RD2 R KU975948 Sarcophaga sarraceniae RD3 R KU975949 Sarcophaga sarraceniae RD4 R KU975950 Sarcophaga sarraceniae RD5 R KU975951 Sarcophaga sarraceniae RD6 R KU975952 Sarcophaga sarraceniae RD7 R KU975953 Sarcophaga sarraceniae RD8 R KU975954

172

Sarcophaga sarraceniae RD18 R KU975955 Sarcophaga sarraceniae RD19 R KU975956 Sarcophaga sarraceniae RD20 R KU975957 Fletcherimyia celarata AS183 A KU976051 Fletcherimyia celarata TL3 T KU976052 Fletcherimyia celarata TL7 T KU976053 Fletcherimyia celarata TL10 T KU976054 Fletcherimyia celarata TL11 T KU976055 Fletcherimyia celarata TL20 T KU976056 Fletcherimyia celarata TL21 T KU976057 Fletcherimyia celarata TL25 T KU976058 Fletcherimyia celarata TL26 T KU976059 Fletcherimyia celarata TL27 T KU976060 Fletcherimyia celarata TL28 T KU976061 Fletcherimyia celarata TL29 T KU976062 Fletcherimyia celarata TL31 T KU976063 Fletcherimyia celarata TL32 T KU976064 Fletcherimyia celarata DS24 D KU976065 Fletcherimyia celarata DS33 D KU976066 Fletcherimyia celarata DS34 D KU976067 Fletcherimyia celarata TB4 Tb KU976068 Fletcherimyia celarata TB5 Tb KU976069 Fletcherimyia celarata TB6 Tb KU976070 Fletcherimyia celarata TB22 Tb KU976071 Fletcherimyia celarata TB24 Tb KU976072 Fletcherimyia celarata TB26 Tb KU976073 Fletcherimyia celarata TB27 Tb KU976074 Fletcherimyia celarata TB29 Tb KU976075 Fletcherimyia celarata TB31 Tb KU976076 Fletcherimyia celarata TB42 Tb KU976077 Fletcherimyia celarata TB48 Tb KU976078 Fletcherimyia celarata CB4 C KU976079 Fletcherimyia celarata CB10 C KU976080 Fletcherimyia celarata CB12 C KU976081 Fletcherimyia celarata CB13 C KU976082 Fletcherimyia celarata CB17 C KU976083 Exyra semicrocea AS211 A KU975958 Exyra semicrocea AS212 A KU975959 Exyra semicrocea AS218 A KU975960 Exyra semicrocea LR56 L KU975961

173

Exyra semicrocea LR57 L KU975962 Exyra semicrocea LR63 L KU975963 Exyra semicrocea LR65 L KU975964 Exyra semicrocea TL74 T KU975965 Exyra semicrocea TL80 T KU975966 Exyra semicrocea TL85 T KU975967 Exyra semicrocea TL86 T KU975968 Exyra semicrocea DS36 D KU975969 Exyra semicrocea DS104 D KU975970 Exyra semicrocea DS107 D KU975971 Exyra semicrocea DS109 D KU975972 Exyra semicrocea FC12 F KU975973 Exyra semicrocea FC13 F KU975974 Exyra semicrocea FC16 F KU975975 Exyra semicrocea FC19 F KU975976 Exyra semicrocea FC20 F KU975977 Exyra semicrocea TB75 Tb KU975978 Exyra semicrocea TB79 Tb KU975979 Exyra semicrocea BL36 B KU975980 Exyra semicrocea BL78 B KU975981 Exyra semicrocea BL84 B KU975982 Exyra semicrocea SD1 S KU975983 Exyra semicrocea PT6 P KU975984 Exyra semicrocea PT46 P KU975985 Exyra semicrocea PT50 P KU975986 Exyra semicrocea CB8 C KU975987 Exyra semicrocea CB85 C KU975988 Exyra semicrocea CB95 C KU975989 Exyra semicrocea CB96 C KU975990 Exyra semicrocea CB100 C KU975991 Exyra semicrocea CB103 C KU975992 Exyra semicrocea RD29 R KU975993 Exyra semicrocea RD33 R KU975994 Macroseius biscutatus AS223b A KU976084 Macroseius biscutatus AS223c A KU976085 Macroseius biscutatus LR67a L KU976086 Macroseius biscutatus LR67b L KU976087 Macroseius biscutatus TL91a T KU976088 Macroseius biscutatus TL91b T KU976089 Macroseius biscutatus DS113a D KU976090

174

Macroseius biscutatus DS113b D KU976091 Macroseius biscutatus FC27a F KU976092 Macroseius biscutatus TB87a Tb KU976093 Macroseius biscutatus TB87b Tb KU976094 Macroseius biscutatus PT56a P KU976095 Macroseius biscutatus PT56b P KU976096 Macroseius biscutatus RD32a R KU976097 Macroseius biscutatus RD32c R KU976098 Macroseius biscutatus CB102b C KU976099 Macroseius biscutatus CB102c K KU976100 Macroseius biscutatus KS34a K KU976101 Macroseius biscutatus KS34b K KU976102 Macroseius biscutatus KS34c K KU976103 Macroseius biscutatus KS34d K KU976104 Peucetia viridans AS59 A KU975995 Peucetia viridans AS188 A KU975996 Peucetia viridans AS189 A KU975997 Peucetia viridans AS192 A KU975998 Peucetia viridans LR33 L KU975999 Peucetia viridans LR34 L KU976000 Peucetia viridans LR45 L KU976001 Peucetia viridans TL39 T KU976002 Peucetia viridans TL40 T KU976003 Peucetia viridans TL46 T KU976004 Peucetia viridans TL60 T KU976005 Peucetia viridans DS37 D KU976006 Peucetia viridans DS38 D KU976007 Peucetia viridans DS53 D KU976008 Peucetia viridans DS54 D KU976009 Peucetia viridans DS66 D KU976010 Peucetia viridans DS97 D KU976011 Peucetia viridans FC4 F KU976012 Peucetia viridans FC5 F KU976013 Peucetia viridans FC6 F KU976014 Peucetia viridans TB12 Tb KU976015 Peucetia viridans TB13 Tb KU976016 Peucetia viridans TB49 Tb KU976017 Peucetia viridans TB52 Tb KU976018 Peucetia viridans BL37 B KU976019 Peucetia viridans BL38 B KU976020

175

Peucetia viridans BL40 B KU976021 Peucetia viridans BL41 B KU976022 Peucetia viridans BL64 B KU976023 Peucetia viridans PT17 P KU976024 Peucetia viridans PT18 P KU976025 Peucetia viridans PT23 P KU976026 Peucetia viridans PT36 P KU976027 Peucetia viridans PT39 P KU976028 Peucetia viridans RD9 R KU976029 Peucetia viridans RD11 R KU976030 Peucetia viridans RD12 R KU976031 Peucetia viridans RD21 R KU976032 Peucetia viridans RD22 R KU976033 Peucetia viridans RD23 R KU976034 Peucetia viridans CB20 C KU976035 Peucetia viridans CB21 C KU976036 Peucetia viridans CB33 C KU976037 Peucetia viridans CB52 C KU976038 Peucetia viridans CB53 C KU976039 Peucetia viridans CB71 C KU976040 Peucetia viridans CB72 C KU976041 Peucetia viridans CB73 C KU976042 Peucetia viridans KS23 K KU976043 Peucetia viridans KS42 K KU976044 Peucetia viridans KS43 K KU976045 Peucetia viridans KS52 K KU976046 Peucetia viridans KS53 K KU976047 Peucetia viridans KS54 K KU976048 Peucetia viridans KS65 K KU976049 Peucetia viridans KS66 K KU976050 Misumenoides formosipes AS1 A KU976105 Misumenoides formosipes AS2 A KU976106 Misumenoides formosipes AS3 A KU976107 Misumenoides formosipes AS31 A KU976108 Misumenoides formosipes AS32 A KU976109 Misumenoides formosipes AS195 A KU976110 Misumenoides formosipes DS70 D KU976111 Misumenoides formosipes BL53 B KU976112 Misumenoides formosipes BL65 B KU976113 Misumenoides formosipes PT28 P KU976114

176

Misumenoides formosipes RD13 R KU976115 Misumenoides formosipes RD14 R KU976116 Misumenoides formosipes RD24 R KU976117 Misumenoides formosipes CB22 C KU976118 Misumenoides formosipes CB23 C KU976119 Misumenoides formosipes CB35 C KU976120 Misumenoides formosipes CB57 C KU976121 Misumenoides formosipes CB58 C KU976122 Misumenoides formosipes KS24 K KU976123 Misumenoides formosipes KS25 K KU976124 Misumenoides formosipes KS26 K KU976125 Misumenoides formosipes KS27 K KU976126 Misumenoides formosipes KS28 K KU976127 Misumenoides formosipes KS29 K KU976128 Misumenoides formosipes KS30 K KU976129 Misumenoides formosipes KS31 K KU976130 Misumenoides formosipes KS44 K KU976131 Misumenoides formosipes KS45 K KU976132 Misumenoides formosipes KS46 K KU976133 Misumenoides formosipes KS47 K KU976134 Misumenoides formosipes KS55 K KU976135 Misumenoides formosipes KS56 K KU976136 Misumenoides formosipes KS67 K KU976137 Misumenoides formosipes KS68 K KU976138

B.11. RDA analysis. Results suggests that geography explains genetic variation in all species minus the spiders (based on significant p-values). This pattern is further reflected in R squared values.

S. sarra F. celar E. semic M. biscu P. virid M. formo p -value 0.001 0.004 0.001 0.001 0.909 0.641 Rsquared 0.680 0.319 0.443 0.775 0.019 0.052 Adj Rsquared 0.676 0.274 0.410 0.750 -0.018 -0.009

177

B.12. Phylogeographic concordance factors for all permutations of K levels from the

Pacific Northwest data set. For each K level, the taxonomic compositions are sorted to show the species groupings with the highest value of phylogeographic concordance. See

Carstens et al. (2005) for details.

Model K PCFaverage Species Composition 26 5 0.67 Ascap, Dicam, Micro, Pleth, Salix 23 4 0.83 Ascap, Dicam, Pleth, Salix 21 4 0.74 Ascap, Dicam, Micro, Pleth 24 4 0.60 Ascap, Micro, Pleth, Salix 22 4 0.59 Ascap, Dicam, Micro, Salix 25 4 0.59 Dicam, Micro, Pleth, Salix 12 3 0.99 Ascap, Dicam, Pleth 16 3 0.79 Ascap, Pleth, Salix 13 3 0.78 Ascap, Dicam, Salix 19 3 0.78 Dicam, Pleth, Salix 14 3 0.67 Ascap, Micro, Pleth 11 3 0.66 Ascap, Dicam, Micro 17 3 0.66 Dicam, Micro, Pleth 20 3 0.46 Micro, Pleth, Salix 15 3 0.46 Ascap, Micro, Salix 18 3 0.45 Dicam, Micro, Salix 3 2 0.99 Ascap, Pleth 6 2 0.98 Dicam, Pleth 1 2 0.98 Ascap, Dicam 10 2 0.69 Pleth, Salix 4 2 0.69 Ascap, Salix 7 2 0.67 Dicam, Salix 8 2 0.51 Micro, Pleth 2 2 0.51 Ascap, Micro 9 2 0.50 Micro, Salix 5 2 0.49 Dicam, Micro

B.13. Phylogeographic concordance factors for all permutations of K levels from the

Neotropical bird data set. For each K level, the taxonomic compositions are sorted to 178 show the species groupings with the highest value of phylogeographic concordance. See

Smith et al. (2014b) for details.

Model K PCFaverage Species Composition 11 4 0.81 Cymbi, Queru, Schif, Xenop 8 3 0.92 Cymbi, Queru, Xenop 10 3 0.83 Queru, Schif, Xenop 7 3 0.75 Cymbi, Queru, Schif 9 3 0.75 Cymbi, Schif, Xenop 5 2 1.00 Queru, Xenop 1 2 0.88 Cymbi, Queru 3 2 0.88 Cymbi, Xenop 2 2 0.86 Cymbi, Schif 4 2 0.75 Queru, Schif 6 2 0.75 Schif, Xenop

B.14. Results from PyMsBayes analyses. Prior distribution is centered around six divergence episodes. Posterior distributions are very similar to the prior distribution, regardless of how population sizes (ancestral and daughter) are treated.

Divergence Episodes Unique models Prior θA ≠ θD Posterior θA = θD Posterior 1 1 0.0000 0.0000 0.0010 2 63 0.0020 0.0020 0.0010 3 301 0.0120 0.0110 0.0120 4 350 0.0640 0.0730 0.0710 5 140 0.2380 0.2520 0.1860 6 21 0.3600 0.3930 0.4260 7 1 0.3240 0.2690 0.3030

179

B.15. Results from PyMsBayes analyses with spiders removed. Prior distribution is centered around two divergence episodes. Posterior distributions are very similar to the prior distribution, regardless of how population sizes (ancestral and daughter) are treated.

Divergence Episodes Unique models Prior θA ≠ θD Posterior θA = θD Posterior 1 1 0.3036 0.2980 0.3110 2 15 0.4252 0.4550 0.4310 3 25 0.2176 0.1950 0.2140 4 10 0.0492 0.0430 0.0420 5 1 0.0044 0.0090 0.0020

B.16. Detailed information regarding the mutation rate and generation times used for

PyMsBayes analyses.

PyMsBayes

Mutation rate and generation length are required for scaling the parameter estimate of divergence time into real time units. The mutation rate for animal mitochondrial DNA was estimated to be 6.2 X 10-8 per site per generation (Haag-Liautard et al. 2008), based on full mitochondrial genome sequencing of Drosophila fruit flies over many generations. Generation times were more challenging to estimate, given the lack of studies within these specific groups. We gathered information from the literature to best inform these values:

Sarracenia alata - These are long-lived perennial herbs, with an estimated life expectancy of 59 years (Brewer 2001). Following Zellmer et al. (2012), we used a conservative estimate of 2 years for generation time.

180

Fletcherimyia celarata and Sarcophaga sarraceniae - Work on the effects of diapause in flesh flies suggests a generation time of roughly 40 days (Denlinger 1978). Length of generation time, however, varies with both photoperiod and temperature (Chen et al.

1987; Lee et al. 1987). Given the warm, humid temperature of the southeastern United

States over much of the year, we used a generation time of 1/5 years for both of the fly species.

Exyra semicrocea - Observations by Moon et al. (2008) suggest these moths have two generations per year. Therefore, we used a generation time of 1/2 years.

Macroseius biscutatus - Experiments by Muma and Denmark (1967) recovered an average generation time ranging from ~12 to 18 days. We used a generation time of 1/12 years.

Peucetia viridans and Misuminoides formosipes - Araneomorph spiders generally have one generation per year (Foelix 1982). We used a generation time of 1 year.

B.17. Python script (PCF.py) for calculating Phylogeographic Concordance Factors. This script is available on github

(https://github.com/jordansatler/PhylogeographicConcordanceFactors).

181

PCF.py #! /usr/bin/env python import sys import re import subprocess import os import shutil import itertools

""" Script is a pipeline for generating Phylogeographic Concordance Factors. \ Input is posterior distribution of trees for each species (from *BEAST), \ and the script will clean the files, calculate counts for each unique \ topology for each species, and then calls BUCKy to calculate a concordance \ tree and concordance factors. Specifically, this returns a concordance tree \ and concordance factors for all possible combinations, plus returns the \ average nodal support for each concordance tree. The average nodal support \ value can be used to compare models of the same dimensions. \

Author: Jordan Satler Date: 10 April 2015

### This script was used to run all analyses in Satler and Carstens (2016). Please see my github site for the latest version of the script. https://github.com/jordansatler/PhylogeographicConcordanceFactors ###

""" def clean(file): #Removes population information from *BEAST tree files. Openf = open(file, 'r') OutfileName = Openf.name[:-6] Out = [] Trees = 0 for line in Openf: line = line.strip() if "STATE" not in line: Out.append(line) elif "STATE" in line: 182

r = re.sub('\[(.*?)\]', '', line) Out.append(r) Trees += 1 Openf.close() return Out, OutfileName, Trees

def OGnum(postDist, filename): #Pass a posterior distribution of species tree and return with two OG added. MB_file = filename + '_mbsumReady.txt' OpenW = open(MB_file, 'w') Pops = [] Traits = {} Added = ''

for line in postDist: if 'STATE' not in line: if re.search('^\d+', line): line_a = line.strip(',').split() line_b = line.strip(',') Pops.append(line_a[0]) Added += line_b + ',' + '\n' Traits[line_a[0]] = line_a[1]

elif line == ';' and len(Pops) > 0: to_add = "%d OG1,\n%d OG2\n;\n" % (int(max(Pops)) + 1, int(max(Pops)) + 2) Added += to_add

else: Added += line + '\n'

elif 'STATE' in line: s = re.sub(r'(\(.*\))', r'((\1,' + str(OG1) + '),' + str(OG2) + ')', line) Added += s + '\n'

if Pops: OG1 = int(max(Pops)) + 1 OG2 = OG1 + 1

Traits[str(int(max(Pops)) + 1)] = 'OG1' Traits[str(OG1 + 1)] = 'OG2'

OpenW.write(Added) 183

OpenW.close() return MB_file, Traits

def mbsum(trees, PostSum): #Calls mbsum to get unique topology counts. #Discards first 10% of trees as burn-in. #Then moves files into their respective folders OpenMB = open(trees, 'r') Name = OpenMB.name Out = Name[:-4] + "_mbsum_Results.txt" Post = PostSum * 0.1

call = subprocess.call(["mbsum", "-n", str(int(Post)), "-o", str(Out), str(Name)]) OpenMB.close()

#Move mbsum input files if os.path.exists("./mbsum/mbsum_in/"): shutil.move("./" + str(Name), "./mbsum/mbsum_in/" + str(Name)) else: os.makedirs("./mbsum/mbsum_in/") shutil.move("./" + str(Name), "./mbsum/mbsum_in/" + str(Name))

#Move mbsum output files if os.path.exists("./mbsum/mbsum_out/"): shutil.move("./" + str(Out), "./mbsum/mbsum_out/" + str(Out)) else: os.makedirs("./mbsum/mbsum_out") shutil.move("./" + str(Out), "./mbsum/mbsum_out/" + str(Out)) return None

def bucky(): """This function will take in summaries of tree topology distributions and create a concordance tree with concordance factors."""

path = "./mbsum/mbsum_out/" trees = [filename for filename in os.listdir(path)] #cmd = ["bucky", "-n", "1000", "-o", "PCF"]

#This section returns community tree for all permutations Data_sets = combos(trees) Num = 1 for j in Data_sets[0]: 184

#cmd = ["bucky", "-a", "100", "-n", "100000", "-o", "PCF"] cmd = ["bucky", "--use-independence-prior", "-n", "100000", "-o", "PCF"] for l in range(len(j)): full = path + j[l] cmd.append(full) runBucky = subprocess.call(cmd) if not os.path.exists("./bucky_results/"): os.mkdir("./bucky_results") if not os.path.exists("./bucky_results/bucky_" + str(Num) + "/"): os.makedirs("./bucky_results/bucky_" + str(Num))

Out_bucky = [file for file in os.listdir("./")] if not os.path.exists("./bucky_results/bucky_" + str(Num) + "/"): os.makedirs("./bucky_results/bucky_" + str(Num)) for i in Out_bucky: if 'PCF' in i and i != 'PCF.concordance': shutil.move(str(i), "./bucky_results/bucky_" + str(Num) + "/" + str(i))

elif 'PCF' in i and i == 'PCF.concordance': buildPCF = concordance_tree(Traits) shutil.move(str(i), "./bucky_results/bucky_" + str(Num) + "/" + str(i))

nodalSup = calculate()

taxa = [] for tax in range(len(j)): taxonName = j[tax] taxa.append(taxonName[:-29]) models = ', '.join(taxa)

with open("All_Combinations.txt", 'a') as All: if Num == 1: Header = "Model\tK\tAverage\tTaxa" Out = "%d\t%d\t%.4f\t%s" % (Num, len(j), nodalSup, models) All.write(Header + '\n' + Out + '\n') if Num > 1: Out = "%d\t%d\t%.4f\t%s" % (Num, len(j), nodalSup, models) All.write(Out + '\n')

shutil.move("PCF_Tree.tre", "./bucky_results/bucky_" + str(Num) + 185

"/PCF_Tree.tre") Num += 1 return trees

def combos(trees): """This returns all possible permutations of taxa, from K of 2 to K of length(taxa), as a list."""

comb = [] for i in range(2, len(trees) + 1): for subset in itertools.combinations(trees, i): if subset not in comb: comb.append(subset) return comb, len(trees)

def concordance_tree(Traits): #Takes Bucky output and returns modified concordance tree with concordance factors Infile = open("./PCF.concordance", 'r') #Infile = open("./bucky_results/PCF.concordance", 'r')

lines = [] tmp = '' for line in Infile: line = line.strip() if tmp: lines.append(line) tmp = '' if line == 'Primary Concordance Tree with Sample Concordance Factors:': tmp = line

Tree = '' for i in lines: tmp = '' for j in range(len(i)): if i[j] == ':' and tmp.isdigit() == True: Tree += Traits[tmp] else: Tree += tmp tmp = '' tmp = i[j]

with open('PCF_Tree.tre', 'w') as PCF: 186

PCF.write(Tree[1:len(Tree) - 21] + ');')

Infile.close() return None

def calculate(): """This will calculate the average nodal support""" with open("PCF_Tree.tre", 'r') as tree: for line in tree: line = line.strip()

x = re.findall("\):\d.\d+", line)

Total = 0 for i in range(len(x) - 1): Total += float(x[i][2:]) return Total / (len(x) - 1)

if __name__ == '__main__': if len(sys.argv) < 2: print "python PCFs.py Input*" sys.exit() Filelist = sys.argv[1:] for file in Filelist: Out, OutfileName, Trees = clean(file) MB_file, Traits = OGnum(Out, OutfileName) mb_out = mbsum(str(MB_file), Trees)

if Filelist.index(file) == len(Filelist) - 1: PCFs = bucky()

187

Appendix C: Chapter 4 Supplemental Material

C.1. STRUCTURE results for F. celarata at the K = 2 level, after we subsampled the eastern locales to match in sample size with the west. We subsampled two individuals from each of the three eastern locales, and repeated this process to get 10 subsampled data sets. Results show a strong east–west genetic clustering across nearly all of the data sets.

188

Fletcherimyia celarata

West East CB TL DS TB replicate 1

replicate 2

replicate 3

replicate 4

replicate 5

replicate 6

replicate 7

replicate 8

replicate 9

replicate 10

189

C.2. Divergence time estimates from FSC2. Results show estimates of divergence times in years across the Mississippi River for each of the ten replicated data sets. Mean and

95% CI are presented from an isolation-with-migration model with the linked AFS for each species.

800

700

600

500

400

ergence Time (Kya) 300 v Di

200

100

0

E. semicrocea S. sarraceniae F. celarata M. formosipes P. viridans Moth Flies Spiders

C.3. Effective population size estimates from populations on either side of the Mississippi

River from FSC2. Results are from the ten replicated data sets. Mean and 95% CI are presented from an isolation-with-migration model with the linked AFS for each species. 190

800000

700000

600000

500000

400000 Ne

300000

200000

100000

0

W E W E W E W E W E E. semicrocea S. sarraceniae F. celarata M. formosipes P. viridans

Moth Flies Spiders

C.4. Arthropod sampling information. See Figure 6 for locality information.

Species Sample code Locale Side of Mississippi River Exyra semicrocea BL36 B west Exyra semicrocea BL78 B west Exyra semicrocea BL79 B west Exyra semicrocea BL84 B west Exyra semicrocea SD1 S west Exyra semicrocea PT46 P west Exyra semicrocea PT50 P west Exyra semicrocea PT6 P west Exyra semicrocea CB103 C west 191

Exyra semicrocea CB85 C west Exyra semicrocea CB8 C west Exyra semicrocea RD29 R west Exyra semicrocea RD33 R west Exyra semicrocea AS215 A east Exyra semicrocea AS218 A east Exyra semicrocea LR63 L east Exyra semicrocea LR65 L east Exyra semicrocea TL85 T east Exyra semicrocea TL86 T east Exyra semicrocea DS109 D east Exyra semicrocea DS36 D east Exyra semicrocea DS99 D east Exyra semicrocea FC19 F east Exyra semicrocea FC20 F east Exyra semicrocea TB75 Tb east Exyra semicrocea TB79 Tb east Sarcophaga sarraceniae BL1 B west Sarcophaga sarraceniae BL2 B west Sarcophaga sarraceniae BL6 B west Sarcophaga sarraceniae PT11 P west Sarcophaga sarraceniae PT12 P west Sarcophaga sarraceniae CB1 C west Sarcophaga sarraceniae CB2 C west Sarcophaga sarraceniae CB5 C west Sarcophaga sarraceniae KS1 K west Sarcophaga sarraceniae KS2 K west Sarcophaga sarraceniae KS4 K west Sarcophaga sarraceniae KS5 K west Sarcophaga sarraceniae LR2 L east Sarcophaga sarraceniae LR4 L east Sarcophaga sarraceniae LR5 L east Sarcophaga sarraceniae TL17 T east Sarcophaga sarraceniae TL1 T east Sarcophaga sarraceniae TL5 T east Sarcophaga sarraceniae DS17 D east Sarcophaga sarraceniae DS18 D east Sarcophaga sarraceniae DS1 D east Sarcophaga sarraceniae TB1 Tb east Sarcophaga sarraceniae TB2 Tb east

192

Sarcophaga sarraceniae TB3 Tb east Fletcherimyia celarata CB10 C west Fletcherimyia celarata CB12 C west Fletcherimyia celarata CB13 C west Fletcherimyia celarata CB17 C west Fletcherimyia celarata CB4 C west Fletcherimyia celarata TL10 T east Fletcherimyia celarata TL11 T east Fletcherimyia celarata TL20 T east Fletcherimyia celarata TL21 T east Fletcherimyia celarata TL26 T east Fletcherimyia celarata TL27 T east Fletcherimyia celarata TL3 T east Fletcherimyia celarata TL7 T east Fletcherimyia celarata DS24 D east Fletcherimyia celarata DS33 D east Fletcherimyia celarata TB22 Tb east Fletcherimyia celarata TB26 Tb east Fletcherimyia celarata TB27 Tb east Fletcherimyia celarata TB31 Tb east Fletcherimyia celarata TB42 Tb east Fletcherimyia celarata TB48 Tb east Fletcherimyia celarata TB4 Tb east Fletcherimyia celarata TB5 Tb east Fletcherimyia celarata TB6 Tb east Misumenoides formosipes BL53 B west Misumenoides formosipes BL65 B west Misumenoides formosipes PT28 P west Misumenoides formosipes CB23 C west Misumenoides formosipes CB35 C west Misumenoides formosipes CB57 C west Misumenoides formosipes CB58 C west Misumenoides formosipes RD13 R west Misumenoides formosipes RD14 R west Misumenoides formosipes KS29 K west Misumenoides formosipes KS30 K west Misumenoides formosipes KS45 K west Misumenoides formosipes KS46 K west Misumenoides formosipes KS55 K west Misumenoides formosipes KS56 K west

193

Misumenoides formosipes KS67 K west Misumenoides formosipes KS68 K west Misumenoides formosipes AS195 A east Misumenoides formosipes AS1 A east Misumenoides formosipes AS2 A east Misumenoides formosipes AS31 A east Misumenoides formosipes AS32 A east Misumenoides formosipes AS3 A east Misumenoides formosipes DS70 D east Peucetia viridans BL37 B west Peucetia viridans BL41 B west Peucetia viridans BL64 B west Peucetia viridans PT18 P west Peucetia viridans PT23 P west Peucetia viridans CB52 C west Peucetia viridans CB71 C west Peucetia viridans CB72 C west Peucetia viridans RD11 R west Peucetia viridans RD23 R west Peucetia viridans KS53 K west Peucetia viridans KS54 K west Peucetia viridans KS65 K west Peucetia viridans AS192 A east Peucetia viridans AS59 A east Peucetia viridans LR34 L east Peucetia viridans LR45 L east Peucetia viridans TL60 T east Peucetia viridans DS38 D east Peucetia viridans DS54 D east Peucetia viridans DS97 D east Peucetia viridans FC4 F east Peucetia viridans FC5 F east Peucetia viridans FC6 F east Peucetia viridans TB13 Tb east Peucetia viridans TB52 Tb east

194

C.5. Estimates of population genetic parameters from FSC2 from an isolation-with-migration model and linked AFS data sets.

Divergence times are in years, scaled by number of generations per year, and migration rates are in 2Nm. Values were averaged across the ten replicated data sets within each species.

Species tau Ne WEST Ne EAST MWE MEW Mean 95% CI Mean 95% CI Mean 95% CI Mean 95% CI Mean 95% CI E. semicrocea 224,402 59,231 – 389,574 133,862 121,635 – 146,088 514,357 475,452 – 553,262 0.26 0.21 – 0.32 0.83 0.43 – 1.22 S. sarraceniae 246,899 232,637 – 261,161 152,965 136,162 – 169,768 639,139 600,377 – 677,901 0.81 0.70 – 0.93 1.18 0.87 – 1.50 F. celarata 77,057 73,392 – 80,721 93,638 59,331 – 127,944 370,606 325,263 – 415,949 1.53 0.95 – 2.12 2.82 1.79 – 3.84 M. formosipes 731,618 704,449 – 758,787 625,711 575,691 – 675,730 547,043 478,602 – 615,484 4.58 4.07 – 5.09 0.36 0 – 0.87 P. viridans 193,740 180,196 – 207,283 339,848 263,784 – 415,912 348,199 281,047 – 415,350 1.83 0.93 – 2.74 6.95 3.12 – 10.78 195

196