Mississippi State University Scholars Junction

Theses and Dissertations Theses and Dissertations

1-1-2015

Genetic Structure of Associated with Chamaecrista Fasciulata

Hanna Elizabeth Dorman

Follow this and additional works at: https://scholarsjunction.msstate.edu/td

Recommended Citation Dorman, Hanna Elizabeth, "Genetic Structure of Rhizobia Associated with Chamaecrista Fasciulata" (2015). Theses and Dissertations. 2398. https://scholarsjunction.msstate.edu/td/2398

This Graduate Thesis - Open Access is brought to you for free and open access by the Theses and Dissertations at Scholars Junction. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of Scholars Junction. For more information, please contact [email protected]. Automated Template C: Created by James Nail 2013V2.1

Genetic structure of rhizobia associated with Chamaecrista fasciulata

By

Hanna Elizabeth Dorman

A Thesis Submitted to the Faculty of Mississippi State University in Partial Fulfillment of the Requirements for the Degree of Masters of Science in Biological Sciences in the Department of Biological Sciences

Mississippi State, Mississippi

August 2015

Copyright by

Hanna Elizabeth Dorman

2015

Genetic structure of rhizobia associated with Chamaecrista fasciulata

By

Hanna Elizabeth Dorman

Approved:

______Lisa Wallace (Major Professor)

______Gary N. Ervin (Committee Member)

______Matthew W. Brown (Committee Member)

______Mark E. Welch (Graduate Coordinator)

______R. Gregory Dunaway Professor and Dean College of Arts & Sciences

Name: Hanna Elizabeth Dorman

Date of Degree: August 14, 2015

Institution: Mississippi State University

Major Field: Biological Sciences

Major Professor: Lisa Wallace

Title of Study: Genetic structure of rhizobia associated with Chamaecrista fasciulata

Pages in Study: 57

Candidate for Degree of Masters of Science

The -rhizobia relationship is an important . Studies have found variation in specificity and the functionality of symbiotic specificity can vary among plants of the same species and among rhizobia, as well as in concert with geographical variation. Here, we examined the diversity and geographic structure of rhizobia nodulating Chamaecrista fasciculata, which grows throughout the east-central U.S. and is symbiotic with species. We investigated the association of geography and soil variables on rhizobial diversity by sampling plant nodules and soil across

Mississippi and evaluated variation in rhizobia housed in different nodules of individual plants. Using nifH and truA, we conducted phylogenetic analyses and mantel tests but did not find that geography correlates with genetic diversity. However, soil variables and genetic distance were significantly correlated. Lastly, we found that rhizobia across nodules of the same plant varied substantially. These results contribute to the knowledge of rhizobial assemblages in natural populations.

DEDICATION

There are many people without whom, this thesis would not have been written.

My family has and always will be my biggest supporters. My parents, Gary and Michele

Listug, thank you for your unwavering love and support. Thank you Brooke Dorman, my twin sister and best friend, for always editing my papers and for being there from womb to tomb. Thank you Caleb Dorman for always providing perspective and helping me laugh. Timothy Barclay II, thank you for loving me and holding my hand through this process. Also, thank you Richard S. Dorman Sr. for always expecting excellence.

Without my family, I would surely have settled.

Additionally, thank you Armed Rasberry, Rosanna Carreras, Lavanaya

Challagundla, and Giuliano Colosimo for being my friends and lab mates. These friendships are certainly the best I have ever had. Lastly, I owe Dr. Lisa Wallace, my adviser, my deepest gratitude. You have limitless patience and are a constant source of knowledge and inspiration. I am grateful every day that you took me on as a student.

Thank you.

ii

ACKNOWLEDGEMENTS

I would like to express my sincere gratitude to Mississippi State University and the Department of Biological Sciences for giving me numerous opportunities as a student, researcher, and teaching assistant. I would like to state my sincere appreciation for my adviser, Dr. Lisa Wallace. Her guidance, expertise and teaching proved invaluable. Without her willingness expand her research into new territories, I would not have completed this thesis. I would like to thank my committee members, Dr. Gary Ervin and Dr. Matthew Brown, for the many hours of reading, editing, and time they have freely given to my project. Also, I would like the thank Haley Bodden for her help as an undergraduate researcher and Dr. Ron Altig for sharing his vast knowledge of scientific writing.

In addition, I would like to thank the Botanical Society of America for the 2013

Graduate Student Research Award, the Mississippi State Biology Department Faculty

Fund (BFF) 2012, 2013, 2014 and 2015, and The America Society of Plant Taxonomists

Undergraduate Researcher of the Year (2012) for helping fund this research.

iii

TABLE OF CONTENTS

DEDICATION ...... ii

ACKNOWLEDGEMENTS ...... iii

LIST OF TABLES ...... vi

LIST OF FIGURES ...... vii

CHAPTER

I. INTRODUCTION ...... 1

1.1 Importance of Rhizobia...... 1 1.2 The Symbiosis ...... 2 1.3 Biogeography ...... 3 1.4 Hypotheses ...... 5

II. MATERIALS AND METHODS ...... 7

2.1 Study System ...... 7 2.2 Sampling Design ...... 8 2.3 Genetic Analysis ...... 10 2.4 Soil Analysis ...... 14

III. RESULTS ...... 17

3.1 Genetic Analyses ...... 17 3.2 Genetic Difference Between Nodules ...... 23 3.3 Comparison Between Genetic and Environmental Data ...... 23

IV. DISCUSSION ...... 27

4.1 Rhizobia Diversity and Legume Specificity ...... 27 4.2 Genetic Structure of Symbiotic Rhizobia ...... 30 4.3 Conclusions ...... 35

iv

REFERENCES ...... 37

APPENDIX

A. COLLECTION SITES, HERBARIUM RECORDS, COORDINATES (WSG83), AND SOIL VARIABLES...... 45

B. GENETIC DIVERISTY OF RHIZOBIA AT EACH SAMPLING SITE FOR THE CONCATENATED DATA SET, TRUA, AND NIFH...... 48

C. GENETIC DISTANCE MATRIX BETWEEN SAMPLING SITES: A LOWER DISTANCE MATRIX OF THE CONCATENATED GENES WAS GENERATED USING THE TAJIMA-NEI MODEL (TAJIMA AND NEI 1984) USING THE GAMMA PARAMETER IN MEGA (TAMURA ET AL., 2013) ...... 50

D. GEOGRAPHIC DISTANCE MATRIX OF SAMPLING SITES GENERATED USING THE HAVERSINE FORMULA (RICK 1999) ...... 52

E. ENVIRONMETNAL DISTANCE MATRIX OF SAMPLING SITES: A COMPOSITE SOIL DISTANCE MATRIX WAS GENERATED USING SQUARED EUCLIDIAN DISNCE WITH A Z-SCORE VARIANCE CORRELATION IN SPSS (2007)...... 54

F. PHYLOGENETIC NETWORK DEPICTING SAMPLING SITES IN EACH GROUP ...... 56

v

LIST OF TABLES

2.1 Spearman’s rank correlation between measured soil characteristics...... 16

3.1 Measures of diversity based on nifH, truA, and the concatenated data set across the seven geographic regions sampled for this study...... 21

3.2 Results from an AMOVA of rhizobia genotypes defined by network groups ...... 24

3.3 Pairwise FST values by phylogenetic group...... 24

3.4 Amova of geographic region ...... 24

3.5 Pairwise FST values by region...... 24

3.6 Variation in genotypes of rhizobia isolated from separate nodules on single host plants...... 25

vi

LIST OF FIGURES

2.1 Map of Mississippi ...... 9

3.1 Neighbor Network ...... 22

3.2 Graphical comparison of the pairwise genetic distances ...... 26

F.1 Sampling sites represented by RXX ...... 57

vii

CHAPTER I

INTRODUCTION

1.1 Importance of Rhizobia

In natural and agricultural ecosystems, nitrogen is a primary limiting factor for plant production. Improving soil nitrogen content ultimately leads to an increase in plant diversity (Graham and Vance, 2003). One of the most important processes by which usable nitrogen is provided to plants is via nitrogen-fixing in the soil. Many of these bacteria, also known as rhizobia, exist freely in the soil, but many others have established symbiotic relationships with a variety of plants. Symbioses between

(Fabaceae) and rhizobia are one of the most important and most studied plant-microbe endosymbiotic systems known (Kouchi et al., 2010). By fixing atmospheric nitrogen into inorganic nitrogen compounds, rhizobia allow legumes to grow in degraded soils (Barton and Northup 2011) and may have contributed to their evolutionary diversification around the world. Approximately 88% of legume species form symbiotic relationships with rhizobia (Graham and Vance, 2003). These symbioses are particularly important in agricultural species (Graham and Vance, 2003; Peoples and Herridge, 1990), including common pea (Pisum sativum L.), chickpea (Cicer arietinum L.), broad bean (Vicia faba

L.), pigeon pea (Cajanus cajan L.) cowpea (Vigna unguiculata (L.) Walp.), and

(Glycine max (L.) Merr).

1

1.2 The Symbiosis

Although divergence of rhizobia genera predates the evolution of legumes (Turner and Young 2000), the ability to develop nodules for rhizobia is a derived trait in the legume family (Laguerre et al., 2001, Silva et al., 2005). The great diversity of legume species (est. 20,000 species; Cronk et al., 2006), coupled with their presence in many different habitats around the world (Dimmitt, 2014), may indicate that symbioses with rhizobia are a key innovation promoting diversification across this family. For example,

Béna et al. (2005) showed that Medicago species exhibiting lower rhizobia specificity have larger ranges, and some Medicago species evolved toward increased specificity because of reduced benefit when they host numerous rhizobia genotypes. Thus, evolutionary diversification of Medicago appears to involve development of new symbiotic abilities in order to exhibit efficient symbiosis in variable habitats (Béna et al.,

2005). Across Fabaceae, nodulating rhizobia species are phylogenetically diverse and include the genera, Rhizobium, Sinorhizobium, Mesorhizobium, Bradyrhizobium and

Azorhizobium (Balachandar et al., 2007).

Legume-rhizobium symbioses often exhibit a high degree of specificity (Kouchi et al., 2010) that is determined by molecular signals between rhizobia and legumes (Yang et al., 2010). Many studies suggest that most legume species are symbiotic with a single rhizobium species (Denison and Kiers, 2004), and perhaps even to the level of rhizobia strains within a single species (Lohar and VanderBosch, 2005). For example, Medicago,

Melilotus, and Trionella species are nodulated only by Sinorhizobium meliloti, and

Pisum, Vicia, Lens, and Lathyrus species are nodulated only by Rhizobium

leguminosarum biovar viciae. Additionally, some rhizobia are found only on certain plant

2

species, such as Rhizobium leguminosarum biovar. trifolii, that only nodulates Trifolium

species (Hirsch et al., 2001). When sampling nodules across a plant, McInnes et al.

(2004) found that 30% of nodules housed a specific strain of rhizobia while the remaining

70% were symbiotic with variable strains; where one nodule represented one strain

(McInnes et al., 2004). While some legumes have evolved toward highly specific legume-rhizobia relationships, other species appear to have developed symbioses with a diverse set of symbionts. For example, rhizobia species such as, Rhizobium NGR234 can nodulate over 110 genera of legumes (Pueppke and Broughton, 1999). Also, the legume

Phaseolus vulgaris, which forms symbioses with at least 20 species of rhizobium, exhibits a broad range of symbiotic partners (Michiels et al., 1998). Kiers et al. (2003) also found that , Glycine max (L.) Merr, can be infected with several rhizobium strains from at least two species, suggesting that generalist symbiotic relationships between legume and rhizobia could be beneficial to the plant. In addition to taxonomic diversity, functional variation of symbioses with rhizobia has been demonstrated in numerous studies. For example, Heath et al. (2010) found that plant fitness varied among combinations of Medicago truncatula, the host, and Sinorhizobium meliloti, the

symbiont. Research also suggests that plant fitness can vary among rhizobia strains and

be geographically structured (Parker, 1999).

1.3 Biogeography

Despite some well-characterized examples in agricultural species, little research

has been undertaken to investigate the breadth of variation in rhizobia symbioses with

natural legumes across geographic space (Rincon et. al., 2007). A common hypothesis for

biogeographic patterns of bacteria is ‘everything is everywhere, but the environment 3

selects’ (Becking, 1934), which suggests that soil microbes respond to contemporary environmental conditions and have high dispersal capabilities thereby erasing the effects of past evolutionary and ecological events (Martiny et al., 2006). Recently this hypothesis has been treated with skepticism because bacteria that regulate important aspects of plant biology, such as nitrogen cycling in nodules, are subject to fundamental evolutionary processes of geographical isolation and natural selection (Rout and Calloway, 2012).

Hypotheses of biogeographic patterns for rhizobia often stem from the idea that rhizobia co-evolve with their host species at the centers of origin and diversify with their hosts across landscapes (Martínez-Romero and Caballero-Mellado, 1996). As new legume species expand their distributions, they may either maintain these original relationships or diversify as they encounter new rhizobia types. Thus, geographic structure in symbiotic rhizobia is expected, but the relative strength of environment versus host species in determining biogeographic patterns of rhizobia is unclear.

For soil bacteria, such as symbiotic rhizobia, there is evidence for environmental properties influencing diversity. For example, Martir et al. (2007) found, when examining root nodules of Dalea purpurea, that the rhizobia community structure was affected by location and soil heterogeneity. Soil microbes often assemble based on cues from the environment (Pasternak, 2013; Xoing et al., 2012). While factors such as climate, precipitation, organic matter, and soil texture have been found to influence soil microbial biogeographic patterns, the most prominent force behind soil microbe structure may be soil pH (Chong et al., 2012). Soil pH is known to influence soil microbial diversity at local (Lauber et al., 2008), regional (Griffiths et al., 2011), and continental (Fierer and

Jackson, 2005) scales. By contrast, Sachs et al. (2009) found evidence that plant host

4

identity was important in explaining rhizobia diversity because hosts were infected by a small subset of rhizobia available in the soil. More studies of geographic variation in rhizobia are needed across a diversity of plant hosts to evaluate whether rhizobia are more likely to exhibit patterns similar to free-living soil bacteria or co-evolving taxa.

1.4 Hypotheses

In this study, we characterized rhizobia that are symbiotic with a common and widespread legume, Chamaecrista fasciculata (Michx.) Greene. Two hypotheses were tested. First, rhizobia that are symbiotic with C. fasciculata are geographically structured.

Parker and Kennedy (2006) found that C. fasciculata is nodulated by in Connecticut, but experimental tests have shown that C. fasciculata can form symbioses with other species, such as Bradyrhizobium japonicum (Tlusty et al., 2004).

Given that C. fasciculata grows in a wide diversity of habitats with varying soil types, I expected that there would be significant genetic differentiation of nodulating rhizobia across habitats, which could reflect environmental influences or co-evolved relationships with local host genotypes. The extent to which abiotic factors of the soil are correlated with genetic structure of nodulating rhizobia was also quantified. I expected that pH would have the strongest effect on rhizobia structure, as pH often shapes the structure of soil microbes (Chong et al., 2012). The second hypothesis tested is that C. fasciculata plants producing multiple nodules can be symbiotic with more than one rhizobia type.

Tlusty et al. (2004) demonstrated experimentally that C. fasciculata can be symbiotic with rhizobia other than B. elkanii. McInnes et al. (2004) found that plants of Medicago species can be symbiotic with multiple strains of Sinorhizobium. Additionally, widespread legumes are thought to be highly promiscuous (Ndlovu et al., 2013). 5

Therefore, the hypothesis that C. fasciculata could establish multiple symbioses seems possible. If multiple rhizobia types are found across nodules of individual plants, then I expect them to exhibit strain-specific differences but of the same species.

6

CHAPTER II

MATERIALS AND METHODS

2.1 Study System

Chamaecrista fasciculata has a range that extends from Minnesota to Mississippi and from the east coast of the U.S. to New Mexico (USDA NRCS 2012). It is an annual species that flowers from July to September. Plants grow to a height between one and three feet and produce inflorescences of yellow flowers marked by red. The flowers attract bees, ants, and butterflies, which often act as pollinators for the plant. Extrafloral nectaries are found on the petioles. The fruit is a narrow pod that is between 1.5 and 2.5 inches long and spirals and splits after maturing to disperse the seeds from the parent.

The morphological and ecological variation exhibited by this species has been recognized by some taxonomists to represent distinct taxa. For example, in Weakley (2012), four varieties of C. fasciculata are described, but these have not been verified in studies to quantify variation across the range of this species. Thus, here I treat all populations as a single species. Chamaecrista fasciculata is an important species in many ecosystems because it provides cover, nectar, and pollen for animals (USDA NRCS 2012). This species grows in open habitats, such as prairies, bluffs, riverbanks, and upland woods, and can grow in many types of soil (USDA NRCS 2012). .

7

2.2 Sampling Design

Sampling sites across the physiographic regions of Mississippi were selected based on herbarium records and to represent variation in soil habitat (Fig. 1.1). Rhizobia in nodules of C. fasciculata were sampled from a total of 23 locations in the Delta (n =

3), Loess Hills (n = 3), North Central hills (n = 3), Tombigbee Hills (n = 3), South

Central Hills (n = 4), Jackson Prairie (n = 3), and Black Belt Prairie (n = 4) (Fig. 1.1 and

Appendix A). Entire roots containing nodules were sampled from at least six plants per site. Plants were randomly selected at each site and represented the entire area of growth of the species. This sampling scheme was used to capture variation in rhizobia at a wide geographic scale, rather than to characterize complete rhizobia diversity at any individual site. A total of 108 plants and 159 nodules were used in this study. A voucher plant specimen, soil sample from the top 10 inches near one plant, and GPS coordinates were collected at each sampling location. Plant vouchers are deposited in the MSU herbarium.

Roots with nodules were stored at -80C until processed for DNA extraction.

8

Figure 2.1 Map of Mississippi

Notes: Showing the seven designated regions corresponding to physiographic areas. Sampled sites are indicated by dots labeled as Rxx.

9

2.3 Genetic Analysis

Rhizobia DNA was extracted from one or two nodules (when present) per plant using the Qiagen DNeasy plant Mini Kit (Qiagen, Valencia, CA) and diluted in 200 ul buffer. Rhizobia were genotyped using partial sequences of the nifH and truA genes. Two genes were analyzed to identify the rhizobia as the use of multiple genes often provides one the ability to fingerprint individuals (Tan et al., 2012; Vinuesa et al., 2005; Zhang et al., 2012). nifH is involved in (Laguerre et al., 2001), and truA is a housekeeping gene involved in translation and ribosomal biogenesis (Ahn et al., 2004).

Both of these markers have been used by others to characterize rhizobia diversity (Zhang et al., 2012; Vinuesa et al., 2004). nifH was amplified and sequenced using primers outlined in Vinuesa et al. (2005). truA was amplified and sequenced using primers from

Zhang et al. (2012). PCR was used to amplify the regions in 12.5 µl volumes containing

1.5 µl DNA, 1X LongAmp buffer (New England Biolabs, Ipswich, MA), 0.8% DMSO,

1.5 U LongAmp Taq (New England Biolabs), 0.32 mM dNTP’s, 0.4 µM forward primer, and 0.4 µM reverse primer. For both genes, prior to the addition of DNA, the reaction tubes were heated to 95°C. The nifH program consisted of denaturation at 95ºC for 3.5 min., 30 cycles of 93.5ºC for 1 min., 58ºC for 1 min., 72ºC for 1 min., and an elongation step of 72ºC for 5 min. truA required a touchdown thermal cycler program as follows: denaturation at 95ºC for 5 min., 11 cycles of 94ºC for 45 sec., 60ºC for 1 min. decreased by 1.0ºC per cycle, 72ºC for 1:00 min., 26 cycles of 94ºC for 45 sec., 50ºC for 1 min.,

72ºC for 1 min., and an elongation step of 72ºC for 10 min. Amplification of PCR products was determined by running a small sample on 1.5% agarose TBE gels with ethidium bromide staining. A negative control was included with each set of reactions to

10

check for contamination. Single band PCR products were cleaned by adding 0.2x

Antarctic Phosphatase buffer, 5 units of Exonuclease I, and 1.25 units of Antarctic

Phosphatase, to 9.5 µl of PCR product. This mixture was heated to 37°C for 15 minutes followed by 80°C for 15 minutes. Once the samples were cleaned, cycle sequencing was conducted in 10 µl reactions using forward and reverse primers and Big Dye version 3.1

(Life Technologies, Carlsbad, California, USA) in separate reactions. Sequenced samples were dried and sent to Arizona State University DNA Lab for capillary electrophoresis.

Forward and reverse sequences were edited and assembled into a consensus sequence using Sequencher version 4.7 (Gene Codes Corporation, Ann Arbor, Michigan, USA).

Sequences were manually aligned using Se-Al v.2.0 (Rambaut 1996). Sequences are deposited in GenBank (Accessions KR186321-KR186443).

The presence of recombinant sequences in each data set was evaluated using multiple methods in RDP4 (Martin and Rybicki 2000): GENECONV (Padidam et al.,

1999), Bootscan (Martin et al., 2005), Chimaera (Posada et al., 2001). Significance was assessed at p < 0.05. Two recombinant sequences were detected and therefore removed from the dataset. Sequence variation for each gene at each of the 23 sampling sites and seven regions was quantified using DNAsp v. 5 (Librado and Rozas 2009).) by calculating number of variable sites (S), haplotypes (h), haplotype diversity (Hd), and nucleotide diversity (π) (Table 1.3 and Appendix B). Analysis of variance (ANOVA;

Fisher, 1925) was used to evaluate if the diversity measures were significant among regions. When the global test was significant, then we used a Tukey’s HSD (Smith 1971) post-hoc analysis to establish where significant differences among the regions existed.

Significance was assessed at p < 0.05 in ANOVA and Tukey’s tests. We examined each

11

gene separately in the ANOVA analysis. A pairwise distance matrix of individuals was generated for each data set using Mega v.6 (Tamura et al., 2013). These matrices were used in testing for congruence between the two genes with the R script, Congruence

Among Distance Matrices (CADM v. 3.0-11) (Campbell et al., 2011)). A run with 999 permutations was used to test the null hypothesis of gene incongruence using a p-value of

0.05 to assess significance.

Given that no evidence of incongruence of the two genes was detected, they were combined to assess evolutionary relatedness of the samples using the NeighborNet algorithm (Bryant and Moulton, 2004) in SplitsTree 4 (Hudson and Bryant, 2006).

Several sequences of Bradyrhizobium from GenBank (Bilofsky and Christian, 1988) were also included in the network analysis to evaluate clustering with known species of

Bradyrhizobium. After an initial network was generated using the sequences collected for this project, a sample of them was compared to other rhizobia sequences in GenBank through BLASTn searches to identify related species and strains. We chose one sequence from each network group (Fig. 1.2 and F) to use in a GenBank (Bilofsky and Christian,

1988) BLASTn search. We used sequences of the top 1-3 hits in a subsequent phylogenetic network analysis under the same run parameters detailed above. We chose the top results from the GenBank (Bilofsky and Christian, 1988) standard nucleotide database and optimized the blast program for megablast, which selects for highly similar sequences. The sequences recovered from GenBank (Bilofsky and Christian, 1988) had

E-values of 0.0 and a percent identity greater than 95.

Using Mega 6 (Tamura et al., 2013), a lower distance matrix of the concatenated genes was generated using the Tajima-Nei model (Tajima and Nei, 1984) with the rate

12

variation among sites modeled by selecting the gamma parameter (Appendix C). We chose the Tajima-Nei model because it allows for deviations in nucleotide frequencies

(Tajima and Nei, 1984). The genetic distance matrix was used in an analysis of molecular variance (AMOVA) (Excoffier et al., 1992) to investigate the degree of genetic divergence among the seven geographic regions, and in a separate analysis, among the groups identified in the phylogenetic network. Only samples collected for this study from

C. fasciculata were included in AMOVA. Arlequin V. 3.5 (Excoffier, 2010) was used to run AMOVA based on pairwise genetic distance; 9999 permutations were used to assess significant genetic differentiation among the defined groups based on a p-value < 0.05.

When a significant global FST value was identified in AMOVA, then pairwise comparisons of FST for each region or phylogenetic group were generated using Arlequin to identify specific areas of genetic divergence in rhizobia genotypes. Significance of pairwise FST values was assessed at a p-value < 0.05 using permutation tests with 9999 iterations.

To evaluate genetic diversity of rhizobia isolated from different nodules on a common host plant, we estimated pairwise genetic distance between each of the 51 sets of sequences for each gene generated in Mesquite v 3.02 (Maddison and Maddison,

2015). These were then compared to intraspecific and interspecific genetic distances reported in other studies that have examined nifH (Gaby and Buckley, 2014) and truA

(Zhang et al., 2012) in rhizobia. We also examined the location of each concatenated

rhizobia genotype sampled from the same plant in the phylogenetic network.

13

2.4 Soil Analysis

Soil samples were allowed to air dry for 20 days and then ground using a mortar and pestle. Any organic material was removed prior to grinding. The soil samples were sent to the University of Arkansas for analysis of pH, P, K, Mg, S, Na, Fe, Mn, Zn, Cu, and B. These variables were used to assess possible soil factors affecting rhizobia assemblages. Spearman’s non-parametric correlation (Spearman, 1907) was used to identify redundant soil variables in the dataset based on a correlation coefficient of 0.6 or higher between any two variables. Correlations were conducted using SPSS v. 21 (IBM

Corp, 2013). Boron and magnesium exhibited significant correlation coefficients (Table

1.1) and were eliminated from any further analyses. To investigate the association between soil properties and genetic structure of rhizobia among sampling sites, we used

Mantel tests (Mantel, 1967). Mantel tests compared: 1) genetic distance vs. a composite distance of soil minerals while controlling for geographic distance, 2) genetic distance vs. distance based on soil pH while controlling for geographic distance, and 3) genetic distance vs. geographic distance. We tested soil pH separately from soil mineral variables because previous studies suggest pH as the factor most significantly affecting soil bacterial assemblages (e.g., Fierer and Jackson, 2006). The distance matrices used in

Mantel tests were calculated in the following manner. A geographic distance matrix was generated using the linear distance between each sampling site. The haversine formula

(Rick, 1999) was used to calculate distance (Appendix E). A genetic distance matrix based on the concatenated data set was generated using MEGA v. 6 (Tamura et al., 2013) and the Tajima-Nei model with gamma (Tamura et al., 2013). A composite soil distance matrix using squared Euclidian distance with a Z-score variance correlation was

14

generated using SPSS (2007) (Appendix E). PASSaGE 2 (Rosenberg and Anderson,

2011) was used to conduct Mantel tests, and we tested for significance of matrix correlations via permutation tests (Rosenberg and Anderson, 2011) with 9999 iterations.

One-tailed analyses were done, and significance of matrix correlations was assessed at a p-value < 0.05.

15

Table 2.1 Spearman’s rank correlation between measured soil characteristics.

pH P K Ca Mg S Na Fe Mn Zn Cu B pH P0.067 K-0.077 0.245 Ca 0.598** 0.014 0.426* Mg-0.147 0.311 0.782** 0.294 S-0.013 0.005 0.188 -0.081 -0.175 Na-0.248 -0.289 0.398 0.280 0.251 0.230 Fe0.425* 0.526** 0.209 -0.123 0.318 0.184 0.340 Mn0.178 0.458* 0.362 0.212 0.404 0.073 0.266 0.390 Zn0.369 0.557** 0.184 0.358 0.153 0.135 -0.056 0.174 0.378 Cu0.173 0.478* 0.424* 0.365 0.515* -0.024 0.018 0.429* 0.481* 0.398 16 B0.529** 0.462* 0.422* 0.701** 0.404 -0.062 0.081 0.151 0.555** 0.638** 0.684** * correlation is significant at p<0.05 ** correlation is significant at p<0.01 Notes: Significant correlations are indicated by * (p < 0.05), ** (p < 0.01), *** (p < 0.001). Phosphorus (P), Potassium (K), Calcium (Ca), Magnesium (Mg), Sulfur (S), Sodium (Na), Manganese (Mn), Zinc (Zn), Copper (Cu), Boron (B).

CHAPTER III

RESULTS

3.1 Genetic Analyses

For each gene region, sequences were generated for 157 individuals. The aligned length of the nifH data set is 609 nucleotides (~ 69% coverage of the gene in

Bradyrhizobium), and it is 492 nucleotides (~67% coverage of the gene in

Bradyrhizobium) for the truA data set. Missing data were present at the beginning and ends of sequences for some individuals in the nifH (22% missing nucleotides) and truA

(21% missing nucleotides) data sets. We found a considerable amount of diversity across regions of the total concatenated data set. There is variation in sampling size across regions, the total number of sequences among geographic regions ranges from 35 to 10, with the Black Belt Prairie having the most sequences. The number of variable sites (S) ranges from 225 (Black Belt Prairie) to 46 (Delta). However, the Tombigbee Hills exhibited a large number of variable sites (S = 223). The number of haplotypes (h) varies considerably among regions, with the Black Belt Prairie having the highest (h = 30) and the Delta having the lowest (h = 11). Haplotype diversity (Hd) is also variable, with a range between 1.0 (Jackson Prairie) and 0.923 (North Central Hills). Black Belt Prairie exhibitedthe highest amount of nucleotide diversity (π = 0.992) and South Central Hills exhibited the lowest amount (0.066). We found differences in diversity measures when examining each gene separately. The diversity measures are all higher for truA than they 17

are in nifH except for the Back Belt Prairie. Additionally, all of the variation found in the

Delta is from truA. Gene diversity statistics are reported for nifH, truA, and the

concatenated data in table 1.2. ANOVA indicated that only the number of haplotypes (h)

is significantly different across regions for both data sets (truA: F6, 21 = 2.784, p = 0.05;

nifH: F6, 21 = 3.860, p = 0.016)). Tukey’s HSD post-hoc test indicated that the Jackson

Prairie and Tombigbee have significantly different number of truA haplotypes (p =

0.049), and that the number of haplotypes is significantly different between Loess Hills and Tombigbee (p = 0.036). No other significant differences were found in diversity among the regions.

A test using the CADM algorithm (Campbell et al., 2011)did not indicate significant incongruence between the two genes (χ² = 0.000188, p < 0.01). Additionally, when the genes were examined separately in a network, the sequences clustered in similar patterns and with the same individuals across the two genes, where only 7 individuals differed. Sequences generated in this project most closely matched Bradyrhizobium sequences in GenBank. We found strong matches to B. elkanii, B. japonicum, B. pachyrhizi, B. cytisi, B. huanghuaihaien, and B. yaunmingense. The network (fig. 1.2)

shows the sampled rhizobia sequences from C. fasciculata group into six primary groups.

No group contains a single species of Bradyrhizobium as identified in comparison to the

GenBank sequences. Although no Bradyrhizobium sequences from GenBank are

represented in network groups four and five, we confirmed that these sequences most

closely match other Bradyrhizobium in separate Blast searches. From group four,

R43_5_N1, R52_2_N1 and R66_1_N2 matched Bradyrhizobium genospecies CF1

(JF821007.1) for nifH and Bradyrhizobium genospecies CCBAU E for truA with a value

18

= 0.0 and a 100% identity match and R45_8_N2 matched Bradyrhizobium genospecies

ZB34 (JF821057.1) for nifH and Bradyrhizobium genospecies CCBAU E for truA with

the E-value = 0.0 and a 100% identity match. From group five, R46_6_N2 matched B.

elkanii (KF859889.1) with the E-value = 0.0 and a 91% identity match and

Bradyrhizobium genospecies CCBAU for truA with the E-value = 0.0 and a 94%,

R65_2_N1 matched Bradyrhizobium genospecies LcCT6 (JF821023.1) with the E-

value=0.0 and a 98% identity match and Bradyrhizobium genospecies CCBAU E for truA

with the E-value = 0.0 and a 95% identity, and R67_4_N2 matched with Bradyrhizobium

genospecies BtLT4 (JF821040.1) with the E-value = 0.0 and a 99% identity match and

Bradyrhizobium genospecies CCBAU E for truA with the E-value = 0.0 and a 100%

identity match. In an AMOVA, where the phylogenetic groups were designated, 66.91%

(FST = 0.669; p < 0.001) of the variation was attributed to differences among the groups

(Table 1.3). Individual pairwise comparisons based on FST were all substantial and

significant with values ranging from 0.35-0.75 and p < 0.001 (Table 1.4).

Although there is phylogenetic structure among the sampled genotypes, there is

not clear evidence of corresponding geographic structure. Only samples from the

Tombigbee Hills are represented in each group. Group 1 has rhizobia sequences from all

seven regions. In group 2, rhizobia are found in the Black Belt Prairie and Tombigbee

Hills. In group 3, rhizobia individuals are from the Black Belt Prairie, Tombigbee Hills,

North Central Hills, Jackson Prairie, and South Central Hills. In group 4, rhizobia

individuals are from the Black Belt Prairie, Tombigbee Hills, North Central Hills, Loess

Hills, Delta, and Jackson Prairie. In group 5, rhizobia individuals are from the Black Belt

Prairie, Tombigbee Hills, North Central Hills, Delta, Jackson Prairie, and the South

19

Central Hills. In group 6, rhizobia individuals are from the Tombigbee Hills and the Delta

(fig 1.2).

AMOVA revealed significant genetic divergence in rhizobia genotypes among

geographic regions. The amount of variation attributed to regional differences is 11.59%

(FST = 0.11; p < 0.001) (Table 1.5). Significant pairwise FST were detected for most

regions as well. Regions that did not exhibit significant differences are the Black Belt

Prairie vs. Tombigbee Hills, Loess Hills, and Jackson Prairie, the Tombigbee Hills vs.

Loess Hills, and Jackson Prairie, the North Central Hills vs. South Central Hills, and the

Loess Hills vs. Jackson Prairies, and the Delta vs. Jackson Prairie. (Table 1.6).

20

Table 3.1 Measures of diversity based on nifH, truA, and the concatenated data set across the seven geographic regions sampled for this study.

Black North South Belt Tombigbee Central Loess Jackson Central Prairie Hills Hills Hills Delta Prairie Hills Concat N 35 27 27 16 16 10 26 S 225 223 122 170 46 123 105 H 30 24 14 13 11 10 23 Hd 0.992 0.991 0.923 0.950 0.950 1 0.991 π 0.1 0.095 0.046 0.065 0.085 0.072 0.066 truA N 35 27 27 16 16 10 26 S 112 125 80 90 46 66 83

21 H 20 20 12 10 11 9 17 Hd 0.96 0.970 0.900 0.870 0.950 0.978 0.960 π 0.14 0.110 0.06 0.097 0.085 0.118 0.070 nifH N 35 27 27 16 16 10 26 S 113 98 42 70 0 57 22 H 21 16 8 8 0 8 10 Hd 0.93 0.940 0.7 0.81 0 0.933 0.711 π 0.076 0.080 0.031 0.050 0 0.048 0.060 Notes: N = Number of sequences; S = segregating sites; h = number of haplotypes; Hd = haplotype diversity; π = nucleotide diversity

2

0.01

3 1

6 2 .

1

7 .

1 1 2

1 . .

Group3 8 9 8 1

. 2 6 6

2 8

7

8

5 Group2

4 6

7 2

3 5

6

2 4 3 C

0 C

4 6 K

1 X K 6

2

J

0

X0 U

i U 6

J

X

z

A

0 i i

J BA

B h

z

i

i C r 2 i

C C y h

n

r .

h C

a p

y

c k s

9 . l

h . a

13 p e

c B

p

s

. a

.

.

p

2 B

.

B

B B

B. 11 Group4 hu 10 an gh ua 0 B ih

1 0 . a . B cy ien 1 . ja tis J

7 i X po K 0 2 n C5 64

4 B. B i 2 c 0 2

. . u 9 7 6 hu C m 4 5 0 C . 20 0 a B C 29 1 4 n . X g A P 1 2 0 h U 0 J u 1

0 ai K 0 5

e h C 3

0 a 3 1 s 1 0 ie 5 3.

n n 6 1 13 1 A KC356797.18

e 4

B 6 g

.1 in

m

u m

c n i

a n 10

4 u o Group1 y

9 p

.

a

j

B Group5 0 0 0 .

B 10 1 0 5 3 1 3 Group6

2 0 0 Black Belt

B 2.1 . ja 270 po Tombigbee 96 1 nic n KF 8. um haie 00 C uai 46 P0 ngh U1 07 hua E 56 B. se 9. North Central en 1 ing on lia B. Loess Hills Delta Jackson Prairie South Central

B. sp. CMVU30 KC247138.1 B. elkanii EU418414.1 B. sp. TUXTLAS JF266695.1 B. elkanii KF859889.1 B. sp. CF1 JF821007.1 B. elkanii JQ810094.1 B. elkanii EU622080.1 B. sp. CMW2 JN993734.1 B. sp. TUXTLAS JF266683.1 B. sp. lppb2 JF821027.1 B. japonicum HM107280.1 B. sp. CCBAU HM107281.1 B. arachidis JQ011358.1 B. japonicum EF512283.1 B. sp. SEMIA HQ259529.1 B. japonicum HM107280.1 B. sp. LcCT4JF821022.1 B. sp. CFRR1 JF821044.1 Figure 3.1 Neighbor Network

Notes: Using Hamming distance to depict phylogenetic patterns of rhizobia genotypes nodulating Chamaecrista fasciculata. Genetically similar genotypes were assigned to groups 1-6, and the number of rhizobia individuals found in each group is indicated in the pie charts. Geographic regions are colored. Bradyrhizobium sequences included from GenBank are labeled by their name and sequence accession number. Each branch indicates a unique haplotype; however the names have been removed for ease of viewing.

22

3.2 Genetic Difference Between Nodules

Our analyses suggest that some C. fasciculata plants harbor highly divergent rhizobia, but the majority of individuals exhibited less than 10% genetic variation between truA and nifH (Figure 1.3). Using a greater than 6% difference in truA to signal interspecific differences (based on the assessment in Zhang et al. (2012) that included truA sequences), we identified 16 plants in our data set that contained highly divergent rhizobia reflecting potentially different species (Table 1.7). Based on the nifH data set, we identified nine plants containing rhizobia genotypes differing by more than 6% (Table

1.7), which is comparable to the findings of Gaby and Buckley (2014) that most rhizobia species differ by greater than 5% in nifH sequences. Seven plants exhibit concordance between the two data sets in having highly divergent sequences and concatenated genotypes that fall into different groups of the phylogenetic network (fig 1.2). The remaining 11 plants exhibit discordance in divergence estimates based on the two genes.

Thus, 14% of the plants that contained multiple nodules are symbiotic with different rhizobia species

3.3 Comparison Between Genetic and Environmental Data

The partial Mantel tests did show significant associations between genetic distance and distance based on soil properties. Genetic distance is correlated with soil distance, when controlling for geographic distance (R2 = 0.44, t = 2.63, right-tailed p =

0.027) as well as distance based solely on pH (R2 = 0.30, t = 2.71, right-tailed p =

0.0049), but genetic distance was not correlated with geographic distance (R2 = -0.05, t =

-0.67, p > 0.05).

23

Table 3.2 Results from an AMOVA of rhizobia genotypes defined by network groups

d.f. Sum of squares Variance Percent components Variation Among groups 5 3551.910 31.27342 61.23 Within groups 150 2970.096 19.80064 38.77 Total 155 6522.006 51.07407 Fixation index FST = 0. 61232 p < 0.001 Notes: Significance p<0.05, Degrees of freedom indicated by d.f.

Table 3.3 Pairwise FST values by phylogenetic group.

Group 1 Group 2 Group 3 Group 4 Group 5 Group 1 0 Group 2 0.661* 0 Group 3 0.421* 0.588* 0 Group 4 0.751* 0.752* 0.691* 0 Group 5 0.680* 0.651* 0.599* 0.356* 0 Group 6 0.555* 0.565* 0.712* 0.697* 0.637* Notes: Significant values are indicated by *, with all of the p-values ≤ 0.001.

Table 3.4 Amova of geographic region

d.f. Sum of squares Variance Percent Variation components Among Regions 6 885.813 4.99142 11.59 Within Regions 150 5710.212 38.06808 88.41 Total 156 6596.025 43.05950 Fixation index FST = 0.11592 p < 0.001 Notes: Significant p-value < 0.05. Degrees of freedom is indicated by d.f.

Table 3.5 Pairwise FST values by region.

Black Belt Tombigbee North Central Loess Delta Jackson Black Belt 0 Tombigbee 0.018 0 North Central 0.130* 0.103* 0 Loess 0.032 0.026 0.051 0 Delta 0.108* 0.114* 0.427* 0.243* 0 Jackson 0.027 0.058 0.278* 0.122* 0.050 0 South Central 0.112* 0.082* 0.010 0.071* 0.369* 0.194* Notes: Significance (p <0.05) is indicated by *. 24

Table 3.6 Variation in genotypes of rhizobia isolated from separate nodules on single host plants.

Plant Sample truA percent nifH percent Network group for nodule

difference difference 1 (N1) and nodule 2 (N2)

R40_1 11.95 1.3 N1: Group 2 N2: Group 2 R40_2 0.7 0 N1: Group 1 N2: Group 1 R40_3 0 6.4 N1: Group 2 N2: Group 2 R40_4 0 0.33 N1: Group 2 N2: Group 2 R41_2 0 0.33 N1: Group 1 N2: Group 1 R41_4 0.88 1.1 N1: Group 4 N2: Group 4 R41_5 7.74 0.82 N1: Group 1 N2: Group 1 R41_6 0 0.65 N1: Group 1 N2: Group 1 R43_5 3.76 0.95 N1: Group 4 N2: Group 4 R45_2 15.49 1.1 N1: Group 3 N2: Group 1 R45_6 0 0.82 N1: Group 4 N2: Group 4 R46_2 2.65 0 N1: Group 1 N2: Group 1 R46_3 6.64 1.2 N1: Group 1 N2: Group 1 R46_4 0 0.82 N1: Group 1 N2: Group 1 R46_5 7.25 0.17 N1: Group 1 N2: Group 1 R46_6 17.7 15.11 N1: Group 2 N2: Group 5 R47_3 17.48 14.45 N1: Group 4 N2: Group 2 R47_5 0 0.82 N1: Group 4 N2: Group 4 R49_2 0.22 0.33 N1: Group 4 N2: Group 4 R49_4 0 0 N1: Group 4 N2: Group 4 R49_6 1.4 0 N1: Group 4 N2: Group 4 R49_9 5.53 0.66 N1: Group 4 N2: Group 4 R52_4 7.97 0.16 N1: Group 1 N2: Group 1 R55_1 0 0.82 N1: Group 4 N2: Group 4 R55_2 3.09 2.5 N1: Group 5 N2: Group 5 R55_3 0 0 N1: Group 4 N2: Group 4 R55_4 0.22 0 N1: Group 4 N2: Group 4 R55_5 0 0 N1: Group 4 N2: Group 4 R55_6 3.32 0.82 N1: Group 5 N2: Group 5 R56_2 1.77 0 N1: Group 1 N2: Group 1 R57_2 0.44 0 N1: Group 4 N2: Group 4 R57_4 3.98 0.82 N1: Group 4 N2: Group 4 R58_1 3.32 0.82 N1: Group 1 N2: Group 1 R58_4 0.22 1.3 N1: Group 1 N2: Group 1 R58_7 0.22 0.33 N1: Group 1 N2: Group 1 R59_4 6.86 0.82 N1: Group 4 N2: Group 5

25

Table 3.6 (continued)

R59_5 2.43 0.84 N1: Group 1 N2: Group 1 R60_2 8.19 0 N1: Group 6 N2: Group 6 R64_1 6.42 10.15 N1: Group 1 N2: Group 4 R64_6 6.19 0.66 N1: Group 4 N2: Group 4 R65_1 5.31 2.13 N1: Group 4 N2: Group 5 R65_4 16.15 9.68 N1: Group 5 N2: Group 1 R66_5 0 0 N1: Group 5 N2: Group 5 R66_6 0 0 N1: Group 4 N2: Group 4 R67_1 0 8.87 N1: Group 5 N2: Group 3 R68_3 14.83 6.91 N1: Group 5 N2: Group 1 R69_2 0 7.39 N1: Group 4 N2: Group 3 R69_3 0 0.16 N1: Group 4 N2: Group 4 R69_4 1.11 0.16 N1: Group 4 N2: Group 4 R69_5 16.59 10.34 N1: Group 4 N2: Group 1 R69_6 13.27 8.54 N1: Group 4 N2: Group 1 Notes: Indicated are rhizobia across nodules exhibiting greater than 5% nucleotide differences. RXX represents the sampling site followed by a number assigned to an individual. The network group of each genotype is indicated.

Figure 3.2 Graphical comparison of the pairwise genetic distances

Notes: Between nifH and truA sequences in rhizobia isolated from the same plant.

26

CHAPTER IV

DISCUSSION

4.1 Rhizobia Diversity and Legume Specificity

Genetic structure of nodulating rhizobia is thought to be influenced by co- evolutionary processes involving selection by hosts. Thus, symbiotic genes that are directly involved in the process of nodulation and perhaps nitrogen fixation may show different genetic patterns than housekeeping genes, which are commonly shared across microbes (Silva et al., 2005). Specifically, housekeeping genes are known to be highly conserved and therefore are expected to reflect the evolutionary history of lineages, whereas symbiotic genes may more likely reflect unique aspects of the symbiotic activities, such as selection by plant hosts or horizontal gene transfer (Silva et al., 2005).

For example, Bradyrhizobium species from the Americas, Asia, and Austria, show random distribution of the 16s haplotypes, whereas nifD haplotypes were distributed according to geography (Parker et al., 2002). Similarly, Silva et al. (2005) found that 16s sequences did not exhibit phylogenetic structure, whereas the symbiotic genes nifH and nodB showed distinct phylogeographic patterns in rhizobia from the Americas, Asia, and

Europe. In addition to the fact that symbiotic genes may not track evolutionary divergence because they are rapidly changing, horizontal gene transfer of symbiotic genes can also occur between bacteria in close contact, resulting in inconsistent phylogenetic patterns relative to housekeeping genes that are not transferred. Given the 27

contrasting patterns often revealed by housekeeping and symbiotic genes, understanding biogeographic patterns of microbes can be difficult. In this study, we found similar phylogenetic structure of nodulating rhizobia genotypes isolated from C. fasciculata with the symbiotic gene nifH and the housekeeping gene truA. This finding suggests a lack of

extensive horizontal gene transfer among the sampled isolates and that the genes depict

common processes that have resulted in the observed diversity.

Using the concatenated data, we found that nodulating rhizobia associated with C.

fasciculata exhibited a substantial amount of diversity. In comparison to the only other

published studies to have characterized symbiotic rhizobia in C. fasciculata (Parker et al.,

2006; Parker, 2012), which only identified Bradyrhizobium elkanii symbionts, our results

suggest that C. fasciculata has a much wider array of rhizobia types with which it can

form symbioses. We identified close sequence matches with B. elkanii, B. japonicum, B.

pachyrhizi, B. cytisi, B. huanghuaihaien, and B. yaunmingense. Parker et al. (2006) and

Parker (2012) only studied populations from Connecticut and North Carolina,

respectively, so it is possible that there are defined intraspecific differences in rhizobia

associated with local adaptation that would only be revealed by a more comprehensive

survey of rhizobia across the range of C. fasciculata.

In comparison to other studies where intraspecific variation in nodulating rhizobia

has been characterized, we comparable or higher diversity in nodulating rhizobia.

Appunu et al. (2008) identified only three species of Bradyrhizobium associated with

soybean, Glycine max, across nine regions in India. Paffetti et al. (1996) found Medicago

sativa varieties were symbiotic with only Rhizobium meliloti, but they identified 96

unique strains across two geographically distinct areas in Italy. In Vicia faba from three

28

ecological regions in China, many strains or isolates were found, but the majority belonged to Rhizobium leguminosarum, and a small minority (six individuals) belonged to an unnamed Rhizobium species (Tian et al., 2007).

Many Bradyrhizobium species are considered to be generalist because they nodulate many wild legume species from different genera (Koppell and Parker 2012;

Ehinger et al., 2014) and some agriculturally important species that are widely planted, such as G. max (Appunu et al., 2008). Koppell and Parker (2012) analyzed biogeographic structure of the genus Bradyrhizoium using five housekeeping genes and nifD, a symbiotic gene, from 41 legume genera that spanned an area from Alaska to Panama.

They found little signal of regional endemism at the continental scale. They found super clades in a phylogenetic analysis of B. elkanii and B. japonicum that spanned all sampled regions. However, when narrowed to a local scale, they found distinctive bacterial populations that spanned multiple scales (Koppell and Parker, 2012). They concluded that the genetic structure of Bradyrhizobium varies locally across regions and that

Bradyrhizobium was associated with diverse legume hosts (Koppell and Parker, 2012).

Our finding that multiple Bradyrhizobium forms associate with C. fasciculata is not surprising as generalist rhizobia often occupy a greater diversity of environments than specialist rhizobia (Ehinger et al., 2014). Examination of individual plants for symbiosis with multiple rhizobia across nodules also indicates relaxed specificity of symbioses in C. fasciculata. In at least seven cases, plants we sampled contained rhizobia that would be considered different species by bacterial standards (Table 1.7; Zhang et al.,

2012; Gaby and Buckley, 2014). In summary, these results suggest that C. fasciculata is a generalist that is capable of forming symbioses with a wide range of Bradyrhizobium

29

genotypes. This concept is counter to the idea that legume-rhizobia relationships are highly specific (Kouchi et al., 2010), where for one species of legume there is often a specific rhizobia species that nodulates it (Hirsch et al., 2001). This concept may apply to restricted host species, but many widely distributed host species are commonly reported to exhibit broad symbioses (e.g., Bennett, 1999; Ndlovu et al., 2013; Zahran, 2001). In fact, Zahran (2001) states that wild widespread legumes are often more promiscuous and have a wider range of symbiotic partners. The success of some invasive legume species may lay in their abilities form symbioses with a wide range of symbionts, as in the invasive in Africa (Ndlovu et al., 2013). As legume success is highly dependent on the functionality of specific rhizobia genotypes (Heath et al., 2010), generalist or specific symbiosies could dictate legume range. This result could have important implications for generalist legumes and rhizobia that could be used across variable landscapes, soil types, and climate zones, and are important for agriculture, nature ecosystems and industry (Rincon et al., 2007).

4.2 Genetic Structure of Symbiotic Rhizobia

I hypothesized that rhizobia genotypes would be geographically structured. The phylogenetic network (fig. 1.2) and associated AMOVA (Tables 1.5 and 1.6) clearly indicated that there are divergent genotypes of Bradyrhizobium associated with C. fasciculata. When we tested for genetic divergence among the network groups we identified significant differences among genotypes assigned to each of the groups (FST =

0.61 p < 0.001; Table 1.3). Thus, the groups we designated based on the network are clearly differentiated. Moreover, we found that all of the network groups were significantly different (p < 0.001). Despite this clear phylogenetic structure within the 30

data set, we do not see strong evidence that it is associated with geography of the sampled locations. Although AMOVA indicated significant structure in rhizobia genetic diversity by region (FST = 0.1159; p < 0.001; Table 1.5), substantially more of the variation across genotypes is found within the regions. Upon further analysis of pairwise regional comparisons, we found significant pairwise FST values between many, but not

all, regions (Table 1.6). We also did not find a significant correlation between geographic

and genetic distance in the Mantel test. When examining the phylogenetic network (fig.

1.2), individuals are not represented in equal numbers across groups. Only genotypes

from Tombigbee Hills are represented in each group. Some geographic regions, such as

Black Belt Prairie, and Tombigbee Hills, support greater rhizobia diversity and are

represented in multiple groups of the network (fig. 1.2). These patterns are also supported

by measures of diversity of nodulating rhizobia from each region. By contrast, North

Central Hills cover the most physical area yet this region does not seem to host as many

types of nodulating rhizobia. The North Central Hills is found in four of the six network

groups with the vast majority of individuals clustering in group four (20 individuals).

Also, only four network groups contain rhizobia individuals from the Delta; with the

majority of the individuals in group one. The nucleotide diversity of the Delta is

relatively high at 0.085, but the Delta has a relatively low number of segregating sites (S

= 46). Individuals present across five network groups represent the North Central Hills,

yet the nucleotide diversity is the same as the Delta at 0.085. The North Central Hills has

an intermediate number of segregating sites (S = 122). Jackson Prairie, and South Central

Hills, are represented by individuals across multiple network groups and have nucleotide

diversities less than 0.073. These two regions also have intermediate number of variable

31

sites (Jackson Prairie S = 123 and South Central Hills S = 105). In summary, these results indicate that geography explains some of the phylogenetic patterns we observed, but there are likely factors within regions that also contribute to rhizobia diversity on a much finer scale. To fully understand the drivers of diversity at coarse geographic scales, wider sampling may be needed.

The lack of strong geographic structure in rhizobia associated with C. fasciculata is consistent with studies on soil bacteria which have often indicated that their genetic structure is largely independent of geographic distance (Fierer and Jackson, 2005).

However, they are inconsistent with some other studies of rhizobia-legume systems. For example, Parker and Spoerke (1998) did find that rhizobia diversity was significantly associated distance between sampling sites of Amphicarpaeae bractenta at a 1000 km scale. Paffetti et al. (1996) also found significant genetic structure in rhizobia across geographically separated populations. Strong genetic differentiation of rhizobia symbionts was found among Vicia caracca populations that are separated by a few kilometers and among regions that are separated between 50 to 350 kilometers (Van

Cauwenberghe et al., 20014).

Given that a strict isolation by distance pattern was not satisfactory in explaining the phylogenetic structure of rhizobia isolated from C. fasciculata, other factors must be considered, including traits of the environment and plant hosts. Diversity and composition of soil microbial communities is often dictated by soil properties. Among these, pH has been found to have a strong effect (Chong et al., 2012; Fierer and Jackson,

2006) because many bacteria are limited in their ability to survive in basic or acidic soils.

For certain rhizobia, soil properties have also been found to influence their presence and

32

diversity. For example, highly acidic soils show less rhizobia diversity than soils where the pH has been artificially increased with the addition of lime (Andrade et al., 2002).

Soil pH and site elevation were found to correlate with diversity of Mesorhizobium symbionts (Lemaire et al., 2015). Martir et al. (2005) found that phosphorus and sodium explained rhizobia variation between sites. Li et al. (2011) investigated rhizobia partners of G. max in China and found that Bradyrhizobium and Sinorhizobium were the two major symbionts, and that genetic structure of rhizobia was correlated with soil pH (Li et al., 2011). Specifically, they found the B. japonicum and B. elkanii strains were found only in soils that were close to neutral, whereas B. yuanmingense, B. liaoningenese, and

Sinorhizobium strains were found in basic soils (Li et al., 2011). Thus, at regional scale, the biogeographic patterns and genetic diversity of Bradyrhizobium can be seen to respond to geography and soil factors (Li et al., 2011).

Similar to these studies, our results indicate that genetic structure in rhizobia associated with C. fasciculata may be influenced by soil characteristics. The Mantel tests revealed a significant correlation between the composite environmental distance and genetic distance as well as distance based only on pH and genetic distance. These results suggest that many properties of the soil can affect availability of nodulating rhizobia of

C. fasciculata. Nevertheless, the correlation coefficients were not high, which suggests that other factors could also be important in explaining the observed structure. Low correlation values between soil characteristics and rhizobia diversity could also be indicative of widespread bacteria that can thrive in multiple soil environments (Fierer and

Jackson, 2006). Given that we found wide variation in soil minerals of sites considered to be in the same physiographic region, our sampling approach may not have adequately

33

captured the continuum of environmental variation to pinpoint specific variables of the soil and their relative influence on rhizobia diversity.

Plant host has also been identified as a major factor contributing to genetic structure among nodulating rhizobia populations because legumes are highly selective of their rhizobia partners (Hirsch et al., 2001). Host and rhizobia engage in signaling prior to the establishment of nodules (Yang et al., 2010). It is at this point that a plant decides whether or not to become symbiotic, and this choice is genetically controlled. Some flavonoids, which are produced by the plant, function in mediating host specificity by inducing nod genes of certain rhizobia species while inhibiting the nod genes of other

species (Hirsch et al., 2001). Variation in plant genes, such as the genes, Rj2, Rj4, and

Rfg1 in soybeans, can restrict nodulation to specific strains of rhizobia (Yang et al.,

2010). This suggests that plant hosts may be the more active partners in determining an

appropriate symbiotic partner. Other evidence for host selection of rhizobia has been

found in ecological studies. For example, Sachs et al. (2009) found that the rhizobia

housed in nodules were a subset of those in the surrounding soil, indicating a strong role

for plant host to choose particular rhizobia genotypes. We did not characterize rhizobia in

soil samples, thus it is not known if the nodulating rhizobia represent the whole or a

subset of soil rhizobia. Because we also did not test for reciprocal genetic structure in the

plant host or test the functionality of associations, we cannot identify the most important

factors in explaining the structure observed. Given the signal that has been identified

between genetic distance and environmental traits, we expect that it is likely some

combination of characteristics that ultimately contribute to the establishment of

successful nodules housing rhizobia.

34

4.3 Conclusions

We found that the suite of rhizobia that are symbiotic with C. fasciculata is much broader than previously identified. This is supported by the notion that widespread legumes are often more promiscuous (Zahran, 2001). Through this study, we also found that the rhizobia partners of an individual plant can vary widely at the genotypic level.

These two results call into question what is meant by specificity in legume-rhizobia symbioses. As we collected legumes and rhizobia across regions in Mississippi, we were able to identify the amount of variation found in the nodulating rhizobia across multiple scales. We found the there is a tremendous amount of variation in nodulating rhizobia associated with C. fasciculata across Mississippi; at regional levels, and even in sampling sites. Geographic distance and associated variation in soil properties contribute to the diversity and structure of nodulating rhizobia of C. fasciculata, but it is likely that other factors, particularly host genotype, are also important in explaining variation in these symbioses. It is expected that the genetic diversity and structure identified in this study reflect adaptive variation between particular hosts and their symbionts, but this should be confirmed with additional studies that characterize fitness differences among divergent rhizobia genotypes.

Legumes are considered important to agriculture and natural ecosystem functioning because of their nitrogen-fixing capacity, which is dependent upon their ability to form symbioses with rhizobia. While the importance of both legumes and rhizobia is recognized, few studies have investigated rhizobia associates in natural populations or identified the extent to which environmental factors affect rhizobia communities. The literature suggests that local adaptation of both the legume and

35

rhizobia are important factors that contribute to coevolution in these systems (Koppell and Parker, 2012). As C. fasciculata has a wide distribution and occupies a diversity of habitats, it is an ideal system in which to investigate the breadth of the effects of geography and environmental variables on the establishment of legume-rhizobia symbioses. Ecosystem functioning is highly dependent on microorganisms (Bossio et al.,

1997). Therefore, understanding the assemblages of the soil microbe, rhizobia, associated with C. fasciculata could aid in predicting how legumes and rhizobia respond to environmental perturbations. Characterizing microbial communities and rhizobia assemblages could also aid in determining ecosystem processes that play significant roles in agriculture and conservation (Bossio et al., 1997). This symbiosis could be heavily dependent on habitat and geography, thereby providing the opportunity for highly variable relationships. If there is variation in rhizobia relationships, then this could have applied significance in the success of different strains throughout eastern North America.

36

REFERENCES

Ahn, K. S., Ha, U., Jia, J., Wu, D., and Jin, S. (2004). The truA gene of Pseudomonas aeruginosa is required for the expression of type III secretory genes. Microbiology, 150(3), 539-547.

Andrade, D., Murphy, P. J., and Giller, K. E. (2002). The diversity of Phaseolus nodulating rhizobial populations is altered by liming of acid soils planted with Phaseolus vulgaris L. in Brazil. Applied and Environmental Microbiology, 68(8), 4025-4034.

Appunu, C., N’Zoue, A., and Laguerre, G. (2008). Genetic diversity of native Bradyrhizobium isolated from soybeans (Glycine Max L.) in different agricultural-ecological-climatic regions of India. Applied and Environmental Microbiology, 74(14), 5991-5996.

Balachandar, D., Raja, P., Kumar, K., and Sundaram, S. P. (2007). Non-rhizobial nodulation in legumes. Biotechnology and Molecular Biology Review, 2, 49-57.

Barton, L., and Northup, D. (2011). Microbial Ecology. New Jersey: Wiley-Blackwell.

Becking, B. (1934). Geobiologie of inleiding tot de milieukunde Diligentia Wetensch. Serie, 18, 19.

Bena, G., Lyet, A., Huguet, T., and Olivieri, I. (2005). Medicago–Sinorhizobium symbiotic specificity evolution and the geographic expansion of Medicago. Journal of Evolutionary Biology, 18(6), 1547-1558.

Bennett, S., and Cocks, P. S. (Eds.). (1999). Genetic Resources of Mediterranean Pasture and Forage Legumes (Vol. 33). Springer Science and Business Media.

Bilofsky, H. S., and Christian, B. (1988). The GenBank® genetic sequence data bank. Nucleic Acids Research, 16(5), 1861-1863.

Bossio, D., Scow, K., Gunapala, N., and Graham, K. (1997). Determinates of soil microbial communities: Effects of agricultural managements, seasons, and soil type on phospholipid fatty acid profiles. Microbial Ecology, 36, 1-12.

37

Bryant, D., and Moulton, V. (2004). Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Molecular Biology and Evolution, 21(2), 255-265.

Campbell, V., Legendre, P., and Lapointe, F. (2011). The performance of the Congruence Among Distance Matrices (CADM) test in phylogenetic analysis. BMC Evolutionary Biology, 11(64), doi:10.1186.

Chong, C., Pearce, D., Convey, P., Yew, W., and Tan, I. (2012). Patterns in the distribution of soil bacterial 16S rRNA gene sequences from different regions of Antarctica. Geoderma, 181, 45-55.

Cronk, Q., Ojeda, I., and Pennington, R. (2006). Legume comparative genomics: progress in phylogenetics and phylogenomics. Plant Biology, 9, 99-103.

Densison, R., and Kiers, E. (2004). Lifestyle alternatives for rhizobia: mutualism parasitism, and forgoing symbiosis. FEMS Microbiology Letters, 237, 187-193.

Dimmitt, M. (2014). Fabaceae (legume family). Arizona-Sonora Desert Museum. Retrieved January 7, 2014, from www.desertmuseum.org/books/nhsd_fabaceae.php.

Ehinger, M., Mohr, T. J., Starcevich, J. B., Sachs, J. L., Porter, S. S., and Simms, E. L. (2014). Specialization-generalization trade-off in a Bradyrhizobium symbiosis with wild legume hosts. BMC Ecology, 14(1), 1-18.

Excoffier, L., and Lischer, H. E. L. (2010). Arlequin suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Molecular Ecology Resources, 10, 564-567.

Excoffier, L., Smouse, P. E., and Quattro, J. M. (1992). Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics, 131(2), 479-491.

Fierer, N., and Jackson, R. (2006). The diversity and biogeography of soil bacterial communities. Proceedings of the National Academy of Sciences, USA, 103, 626- 631.

Fierer, N., Bradford, M. A., and Jackson, R. B. (2007). Toward an ecological classification of soil bacteria. Ecology, 88(6), 1354-1364.

Fisher, R, A. (1925). Statistical Methods for Research Workers, Edinburgh, United Kingdom: Oliver and Boyd.

38

Gaby, J. C., & Buckley, D. H. (2012). A comprehensive evaluation of PCR primers to amplify the nifH gene of nitrogenase. PLoS One, 7(7), e42149

Gene Codes Corporation. (206).Sequencher version 4.7 sequence analysis software. Ann Arbor, Michigan, USA.

Graham, P. H., and Vance, C. P. (2003). Legumes: importance and constraints to greater use. Plant physiology, 131(3), 872-877.

Griffiths, R., Thomson, B., James, P., Bell, T., Bailey, M., and Whiteley, A. (2011). The bacterial biogeography of British soils. Environmental Microbiology, 13(6), 1642-1654.

Hagen, M. J., and Hamrick, J. L. (1996). Population level processes in Rhizobium leguminosarum bv. trifolii: the role of founder effects. Molecular Ecology, 5(6), 707-714.

Heath, K. D. (2010). Intergenomic epitasis and co-evolutionary constraint in plants and rhizobia. Evolution, 64(5), 1446-1458.

Hirsch, A. M., Lum, M. R., and Downie, J. A. (2001). What makes the rhizobia-legume symbiosis so special?. Plant Physiology, 127(4), 1484-1492.

Hudson, D., and Bryant, D. (2006). Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution, 23(2), 254-267

IBM Corp. Released 2012. IBM SPSS Statistics for Windows, Version 21.0. Armonk, NY: IBM Corp.

Konstantinidis, K. T., and Tiedje, J. M. (2005). Genomic insights that advance the species definition for prokaryotes. Proceedings of the National Academy of Sciences, USA, 102(7), 2567-2572.

Kiers, E., Rousseau, R., West, S., and Denison, R. (2003). Host sanctions and the legume-rhizobium mutualism. Nature, 425, 78-81.

Koppell, J. H., and Parker, M. A. (2012). Phylogenetic clustering of Bradyrhizobium symbionts on legumes indigenous to North America. Microbiology, 158, 2050- 2059.

Kouchi, H., Inaizumi-Anraku, H., Hayashi, M., Hakoyama, T., Nakagawa, T., Umehara, Y., Suganuma, N., Kawaguchi, M. (2010). How many peas in a pod? Legume genes responsible for mutualistic symbioses underground. Plant Cell Physiology, 51, 1381-1397.

39

Laguerre, G., Nour, S. M., Macheret, V. (2001). Classification of rhizobia based on nodC and nifH gene analysis reveals a close phylogenetic relationship among Phaseolus vulgaris symbionts. Microbiology, 147, 981–993.

Lauber, C., Strickland, M., Bradford. M., and Fierer, N. (2008). The influence of soil properties on the structure of bacterial and fungal communities across land-use types. Soils Biology and Biochemistry, 10, 2407-2415.

Legendre, P., and Lapointe, F., (2004), Assessing congruence among distance matrices: Single malt Scotch whiskies revisited. Australian and New Zealand Journal of Statistics, 46, 615-629.

Lemaire, B., Dlodlo, O., Chimphango, S., Stirton, C., Schrire, B., Boatwright, J. S., and Muasya, A. M. (2014). Symbiotic diversity, specificity and distribution of rhizobia in native legumes of the core Cape Subregion (South Africa). FEMS Microbiology Ecology, 91(2), 2-42.

Li, Q. Q., Wang, E. T., Zhang, Y. Z., Zhang, Y. M., Tian, C. F., Sui, X. H., and Chen, W. X. (2011). Diversity and biogeography of rhizobia isolated from root nodules of Glycine max grown in Hebei Province, China. Microbial Ecology, 61(4), 917-931.

Librado, P. and Rozas, J. (2009). DnaSP v5: A software for comprehensive analysis of DNA polymorphism data. Bioinformatics, 25, 1451-1452.

Lohar, D., and VandenBosch, K. (2005). Grafting between model legumes demonstrates role for roots and shoots in determining nodule type and host/rhizobia specificity. Journal of Experimental Botany, 56, 1643-1650.

Maddison, W. P., and Maddison, D. R. (2015). Mesquite: a modular system for evolutionary analysis. Verson 3.02. http://mesquiteproject.org.

Mantel, N. (1967). The detection of disease clustering and generalized regression approach. Cancer Research, 27, 209-220.

Martin, D. P., Posada, D., Crandall, D. A., and Williamson, C. (2005). A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. AIDS Research Human Retroviruses, 21, 98-102.

Martin, D., and Rybicki, E. (2000). RDP: detection of recombination amongst aligned sequences. Bioinformatics, 16, 562-563.

Martínez-Romero ,E., and Caballero-Mellado, J. (1996) Rhizobium phylogenies and bacterial genetic diversity. Critical Reviews in Plant Science, 15, 113–140.

40

McInnes, A., Thies, J. E., Abbott, L. K., and Howieson, J. G. (2004). Structure and diversity among rhizobial strains, populations and communities–a review. Soil Biology and Biochemistry, 36(8), 1295-1308.

MDAH: Mississippi Department of Archives and History. (2015). Regions. Retrieved on March 29, 2015. http://trails.mdah.ms.gov/regions.htm.

Michiels, J., Dombrecht, B., Vermeiren, H., Xi, C., Luyten, E., and Vanderleyden, J. (1998). Phaseolus vulgaris is a non-selective host for nodulation, FEMS microbial Ecology, 26, 193-205.

Ndlovu, J., Richardson, D., Wilson, J., and Roux J. (2013). Co-invasion of South African ecosystems by an Australian legume and its rhizobial symbionts. Journal of Biogeography, 40, 1240-1251.

Padidam, M., Sawyer, S., and Fauquet, C. (1999). Possible emergence of new geminiviruses by frequent recombination. Virology, 265, 218-225.

Paffetti, D., Scotti, C., Gnocchi, S., Fancelli, S., and Bazzicalupo, M. (1996). Genetic diversity of an Italian Rhizobium meliloti population from different Medicago sativa varieties. Applied and Environmental Microbiology, 62(7), 2279-2285.

Parker, M. (1999). Mutualism in metapopulations of legumes and rhizobia. The American Naturalist, 153, 48-60.

Parker, M., and Spoerke, J. (1998). Geographic structure of lineage associations in a plant-bacterial mutualism. Journal of Evolutionary Biology, 11, 549-562.

Parker, M., Lafay, B., Burdon, J. J., Van Berkum, P. (2002). Conflicting phylogoegraphic patterns in rRNA and nifD indicate regionally restricted gene transfer in Bradyrhizobium. Microbiology, 148, 2557–2565

Parker, M., and Kennedy, D. A. (2006). Diversity and relationships of Bradyrhizobium from legumes native to eastern North America. Canadian Journal of Microbiology, 52, 1148-1157.

Pasternak, Z., Al-Ashhab, A., Gatica, J., Gafny, R., Avraham, S., Minz, D., Gillor, O., and Jurkevitch, E. (2013). Spatial and temporal biogeography of soil microbial communities in arid and semiarid regions. PLoS ONE, 8, e69705.

Peoples, M., and Harridge, D. (1990). Nitrogen fixation by legumes in tropical and subtrobical agriculture. Advances in Agronomy, 44, 156-161.

41

Posada, D., and Crandall, K. A. (2001). Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proceedings of the National Academy of Sciences, USA, 98, 13757-13762.

Pueppke, S., and Broughton, J. (1999). Rhizobium sp. strain NGR234 and R. fredii USDA257 share exceptionally broad nested host ranges. Molecular Plant-Microbe Interactions, 12, 293-318.

Rambaut, A. (1996). Se-Al: Sequence alignment editor software. Available from http://tree.bio.ed.ac.uk/software/seal. Accessed 23 May 2012.

Rick, D. (1999). Deriving the haversine formula. In The Math Forum, Accessed 21 March 2015. http://mathforum.org/library/drmath/view/51879.html.

Rincón, A., Arenal, F., Gonzàlez, I., and Manrique, E. (2007). Diversity of rhizobial bacteria isolated from nodules of gypsophyte Ononis tridentate L. growing in Spanish soils. Microbial Ecology, 56, 223-233.

Rosenberg, M. S., and Anderson, C. D. (2011). PASSaGE: Pattern analysis, spatial statistic and geographic exegensis, Verson 2. Methods in Ecology and Evolution, 2(3), 229-232.

Rout, M. E., and Callaway, R. M. (2012). Interactions between exotic invasive plants and soil microbes in the rhizosphere suggest that ‘everything is not everywhere’. Annals of Botany, 110(2), 213-222.

Sachs, J. L., Kembel, S. W., Lau, A. H., and Simms, E. L. (2009). In situ phylogenetic structure and diversity of wild Bradyrhizobium communities. Applied and Environmental Microbiology, 75(14), 4727-4735.

Silva, C., Vinuesa, P., Eguiarte, L. E., Souza, V., and Martinez-Romero, E. (2005). Evolutionary genetics and biogeographic structure of Rhizobium gallicum sensu lato, a widely distributed bacterial symbiont of diverse legumes. Molecular Ecology, 14(13), 4033-4050.

Smith, R. A. (1971). The effect of unequal group size on Tukey's HSD procedure. Psychometrika, 36(1), 31-34.

Spearman, C. (1907). Demonstration of formulae for true measurement of correlation. The American Journal of Psychology, 161-169.

SPSS Inc. (2007). SPSS for Windows, Version 16.0. Chicago, IL.

42

Tajima, F., and Nei, M. (1984). Estimation of evolutionary distance between nucleotide sequences. Molecular Biology and Evolution, 1, 269-285.

Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. (2013). MEGA6: Molecular evolutionary genetics analysis version 6.0. Molecular Biology and Evolution, 30(12), 2725-2729.

Tian, C. F., Wang, E. T., Han, T. X., Sui, X. H., and Chen, W. X. (2007). Genetic diversity of rhizobia associated with Vicia faba in three ecological regions of China. Archives of Microbiology, 188(3), 273-282.

Tlusty, B., Grossman, J. M, Graham, P. H. (2004). Selection of rhizobia for prairie legumes used in restoration and reconstruction programs in Minnesota. Canadian Journal of Microbiology, 50, 977-983.

Trevor, B., Philippe, H., and Bryant, D. (2006). A simple and robust statistical test for detecting the presence of recombination. Genetics, 172, 2665–2681.

Turner, S. L., and Young, J. P. W. (2000). The glutamine synthetases of rhizobia: phylogenetics and evolutionary implications. Molecular Biology and Evolution, 17(2), 309-319.

Wang, D., Yang, S., Tang, F., and Zhu, H. (2012). Symbiosis specificity in the legume- rhizobial mutualism. Cellular Microbiology, 14(3), 334-342.

Ward, D. M., Cohan, F. M., Bhaya, D., Heidelberg, J. F., Kühl, M., and Grossman, A. (2007). Genomics, environmental genomics and the issue of microbial species. Heredity, 100(2), 207-219.

USDA, NRCS. (2012). The PLANTS Database (http://plants.usda.gov). National Plant Data Team, Greensboro, NC 27401-4901 USA. Accessed 24 September 2012.

Van Cauwenberghe, J., Verstraete, B., Lemaire, B., Lievens, B., Michiels, J., and Honnay, O. (2014). Population structure of root nodulating Rhizobium leguminosarum in Vicia cracca populations at local to regional geographic scales. Systematic and Applied Microbiology, 37(8), 613-621.

Vinuesa, P., Silva, C., Werner, D., and Martinez-Romero, E. (2005). Population genetics and phylogenetic inference in bacterial molecular systematics: the roles of migration and recombination in Bradyrhizobium specie cohesion and delineation. Molecular Phylogenetics and Evolution, 34, 29-54.

Weakley, A. (2012). Flora of the Southern and Mid-Atlantic State. UNC Herbarium, North Carolina Botanical Garden, University of North Carolina.

43

Xiong, J., Liu, Y., Lin, X., Zhang, H., Zeng, J., Hou, J., Yang, Y., Yao, T., Knight, R., and Chu, H. (2012), Geographic distance and pH drives bacterial distribution in alkaline lake sediments across Tibetan Plateau. Environmental Microbiology, 14, 2457-2466.

Yang, S., Tang, F., Gao, M., Krishnan, H. B., and Zhu, H. (2010). R gene-controlled host specificity in the legume–rhizobia symbiosis. Proceedings of the National Academy of Sciences, USA, 107(43), 18735-18740.

Zahran, H., H. (2001) Rhizobia from wild legumes: diversity, taxonomy, ecology, nitrogen fixation, Journal of Biotechnology, 91(2), 143-153.

Zhang, X., Gou, H., Wang, R., Sui, X., Zhang, Y., Wang, E., Tian, C., and Chen, W. (2014). Genetic divergence of Bradyrhizobium nodulating soybeans revealed by multilocus sequence analysis of genes inside and outside the symbiotic island. Applied Environmental Microbiology, 80(10), 3181–3190.

Zhang, Y. M., Li, Y., Chen, W. F., Wang, E. T., Tian, C. F., Li, Q. Q., and Chen, W. X. (2011). Biodiversity and biogeography of rhizobia associated with soybean plants grown in the North China Plain. Applied and Environmental Microbiology, 77(18), 6331-6342.

44

APPENDIX A

COLLECTION SITES, HERBARIUM RECORDS, COORDINATES (WSG83), AND

SOIL VARIABLES

45

Herbarium Site accession Coordinates pH P K Ca Mg S Na Fe Mn Zn Cu B numbers 33.51080 , - R40 MISSA033001 7.6 0.1 15.3 3610.8 10.6 1.1 2.6 2.9 3.5 0.6 0.0 0.0 88.73812 33.61044 , - R41 MISSA033002 7.7 1.1 3.2 243.7 8.0 0.5 0.9 12.3 3.7 0.7 0.2 0.1 88.74504 33.90426 , - R43 MISSA033003 4.0 0.2 12.4 255.5 28.6 0.4 6.8 15.3 0.1 0.1 0.1 0.0 88.84644 33.53832 , - R45 MISSA033004 4.9 0.8 6.7 28.4 6.2 0.5 0.8 6.7 1.6 0.1 0.0 0.0 88.39258 33.83404 , - R46 MISSA033005 6.1 1.4 7.3 196.0 10.8 0.6 0.8 11.3 3.1 0.8 0.2 0.1 88.54369 33.98291 , - R47 MISSA033006 4.6 0.5 4.3 25.6 9.4 1.1 1.4 28.8 1.7 0.2 0.1 0.0 88.50797 33.69892 , - R49 MISSA033007 5.0 0.3 8.3 40.8 25.6 0.9 0.4 4.3 0.8 0.0 0.0 0.0 89.35080 46 33.92705 , - R52 MISSA033008 5.1 1.2 8.0 64.4 10.8 1.2 1.6 26.5 11.5 0.3 0.1 0.0 89.60973 33.37613 , - R55 MISSA033009 4.5 0.9 26.9 183.7 86.5 0.4 4.0 18.6 11.0 0.4 0.1 0.1 88.97548 33.10161 , - R56 MISSA033010 6.0 1.6 24.9 182.5 32.9 0.4 1.0 11.7 4.0 0.2 0.1 0.0 89.96860 33.10603 , - R57 MISSA033011 4.8 0.9 11.8 102.1 32.4 0.7 1.3 15.8 4.7 0.3 0.2 0.0 90.02042 33.20933 , - R58 MISSA033012 7.6 8.1 33.3 443.3 48.9 1.0 1.0 21.5 8.6 1.4 0.6 0.2 90.22324 33.28662 , - R59 MISSA033013 6.6 4.9 15.1 225.4 40.4 0.6 0.8 13.6 6.3 0.9 0.2 0.1 90.22324 33.74908 , - R60 MISSA033014 5.6 1.5 16.0 144.2 38.8 0.7 1.4 24.5 13.9 0.3 0.1 0.0 90.01002 33.47292 , - R61 MISSA033015 6.0 3.9 14.2 158.6 27.1 0.7 1.1 19.9 3.7 1.2 0.1 0.0 90.13770 32.06598 , - R62 MISSA033016 7.8 0.1 3.2 177.5 10.8 0.3 1.0 2.7 0.4 0.1 0.1 0.0 88.72910

31.84590 , - R63 MISSA033017 7.4 0.5 27.0 1651.4 23.5 2.0 4.0 4.9 4.4 0.3 0.2 0.2 88.68717 31.70537 , - R64 MISSA033018 5.2 0.6 3.8 25.0 5.9 0.7 0.8 10.7 3.1 0.3 0.0 0.0 89.03673 31.70167 , - R65 MISSA033019 4.8 1.5 9.4 144.0 30.7 0.7 1.2 13.5 1.4 0.5 0.0 0.0 89.04276 31.94829 , - R66 MISSA033020 7.0 1.3 3.7 237.0 7.2 0.7 1.0 13.7 4.5 0.3 0.0 0.0 89.29206 32.02475 , - R67 MISSA033021 6.2 0.8 6.0 119.6 15.6 0.4 1.1 6.7 6.4 0.4 0.0 0.0 89.40509 32.25506 , - R68 MISSA033022 6.4 0.4 22.1 689.5 53.4 0.6 1.3 11.9 7.7 0.3 0.1 0.1 89.40965 33.27292, - R69 MISSA033023 6.4 1.0 4.8 214.9 13.9 0.5 1.2 11.1 15.6 0.5 0.1 0.1 88.78952

47

APPENDIX B

GENETIC DIVERISTY OF RHIZOBIA AT EACH SAMPLING SITE FOR THE

CONCATENATED DATA SET, TRUA, AND NIFH

48

R40 R41 R43 R45 R46 R47 R49 R52 R55 R56 Concat N 10 9 5 9 11 7 10 5 12 11 S 184 154 99 171 206 203 36 125 54 206 H 9 8 5 9 11 7 8 5 9 11 Hd 0.978 0.972 1.000 1.000 1.000 1.000 0.933 1.000 0.955 1.000 Pi 0.099 0.072 0.039 0.079 0.084 0.087 0.017 0.100 0.012 0.084 truA N 10 9 5 9 11 7 10 5 12 4 S 101 77 37 104 126 78 29 86 37 68 H 6 6 5 7 10 6 6 5 5 4 Hd 0.778 0.889 1.000 0.917 0.982 0.952 0.889 1.000 0.848 1.000 Pi 0.129 0.093 0.035 0.101 0.105 0.072 0.041 0.111 0.041 0.012 nifH N 10 9 5 9 11 7 10 5 12 4 S 83 77 62 67 80 125 7 39 17 66 H 7 8 5 9 8 6 5 5 7 3 Hd 0.911 0.972 1.000 1.000 0.891 0.952 0.800 1.000 0.833 0.833 Pi 0.078 0.056 0.043 0.060 0.062 0.0977 0.0046 0.084 0.008 0.071

R57 R58 R59 R60 R61 R62 R64 R65 R66 R67 R68 R69 Conc at N 6 9 10 6 2 2 7 6 8 5 5 11 S 25 59 36 166 140 54 92 115 96 144 142 129 H 3 9 8 6 2 2 7 6 6 5 5 10 1.00 0.93 1.00 1.00 1.00 1.00 1.00 0.92 1.00 1.00 0.09 Hd 0.600 0 3 0 0 0 0 0 9 0 0 82 0.05 0.01 0.09 0.13 0.06 0.05 0.06 0.03 0.08 0.07 0.06 Pi 0.008 0 7 2 0 2 8 0 4 2 8 0 truA N 6 9 5 6 2 2 7 6 8 5 5 11 S 20 28 96 101 74 52 49 79 29 87 83 67 H 3 7 4 5 2 2 6 5 5 4 4 8 0.94 0.90 0.93 1.00 1.00 0.95 0.93 0.89 0.90 0.90 0.93 Hd 0.600 4 0 3 0 0 0 3 3 0 0 0 0.06 0.12 0.15 0.16 0.13 0.04 0.07 0.04 0.08 0.10 0.08 Pi 0.016 3 0 0 0 0 9 7 6 9 7 1 nifH N 6 9 5 6 2 2 7 6 8 5 5 11 S 5 31 14 65 66 2 43 36 67 57 59 62 H 2 6 4 5 2 2 5 5 3 4 5 7 0.83 0.90 0.93 1.00 1.00 0.90 0.93 0.46 0.90 1.00 0.87 Hd 0.333 3 0 3 0 0 5 3 4 0 0 3 0.002 0.03 0.06 0.06 0.10 0.00 0.07 0.03 0.02 0.07 0.05 0.04 Pi 7 9 2 2 8 4 0 8 7 5 6 8 N = number of sequences; S= segregating sites; H=haplotypes; Hd=haplotyupe diversity;

Pi= nucleotide diversity 49

APPENDIX C

GENETIC DISTANCE MATRIX BETWEEN SAMPLING SITES: A LOWER

DISTANCE MATRIX OF THE CONCATENATED GENES WAS

GENERATED USING THE TAJIMA-NEI MODEL

(TAJIMA AND NEI 1984) USING

THE GAMMA PARAMETER

IN MEGA (TAMURA

ET AL., 2013)

50

R40 R41 R43 R45 R46 R47 R49 R52 R55 R56 R57 R58 R59 R60 R61 R62 R64 R65 R66 R67 R68 R69 R40 0.00 R41 0.14 0.00 R43 0.16 0.12 0.00 R45 0.16 0.11 0.07 0.00 R46 0.15 0.09 0.13 0.12 0.00 R47 0.17 0.15 0.08 0.11 0.15 0.00 R49 0.17 0.13 0.03 0.06 0.14 0.07 0.00 R52 0.16 0.10 0.09 0.10 0.11 0.12 0.09 0.00 R55 0.17 0.13 0.03 0.07 0.14 0.07 0.02 0.10 0.00 R56 0.15 0.10 0.09 0.09 0.11 0.12 0.09 0.09 0.09 0.00 R57 0.17 0.12 0.03 0.07 0.14 0.08 0.02 0.10 0.02 0.09 0.00 R58 0.15 0.07 0.14 0.12 0.08 0.17 0.15 0.11 0.15 0.11 0.15 0.00 R59 0.16 0.11 0.09 0.10 0.11 0.12 0.10 0.11 0.09 0.10 0.09 0.10 0.00 R60 0.15 0.10 0.09 0.10 0.11 0.12 0.09 0.10 0.09 0.09 0.08 0.11 0.10 0.00 R61 0.16 0.10 0.09 0.10 0.10 0.12 0.09 0.10 0.09 0.10 0.09 0.10 0.10 0.10 0.00

51 R62 0.15 0.09 0.09 0.10 0.11 0.11 0.09 0.09 0.09 0.10 0.10 0.10 0.10 0.09 0.10 0.00 R64 0.16 0.11 0.05 0.08 0.13 0.09 0.05 0.09 0.05 0.09 0.05 0.13 0.10 0.09 0.09 0.08 0.00 R65 0.17 0.12 0.06 0.08 0.13 0.09 0.05 0.10 0.05 0.10 0.05 0.14 0.10 0.10 0.09 0.09 0.07 0.00 R66 0.17 0.12 0.04 0.07 0.14 0.08 0.03 0.09 0.03 0.09 0.03 0.15 0.10 0.09 0.09 0.09 0.05 0.06 0.00 R67 0.15 0.10 0.08 0.09 0.11 0.11 0.08 0.10 0.08 0.10 0.08 0.11 0.10 0.10 0.09 0.08 0.08 0.08 0.08 0.00 R68 0.15 0.08 0.10 0.10 0.10 0.13 0.11 0.10 0.11 0.10 0.11 0.08 0.10 0.10 0.09 0.08 0.10 0.10 0.10 0.09 0.00 R69 0.16 0.11 0.06 0.08 0.13 0.09 0.05 0.09 0.05 0.09 0.05 0.13 0.10 0.09 0.09 0.09 0.07 0.07 0.06 0.09 0.10 0.00

APPENDIX D

GEOGRAPHIC DISTANCE MATRIX OF SAMPLING SITES GENERATED USING

THE HAVERSINE FORMULA (RICK 1999)

52

R6 R40 R41 R43 R45 R46 R47 R49 R52 R55 R56 R57 R58 R59 R60 R61 R62 R64 R65 R66 R67 R68 9 R40 0 R41 11.1 0 R43 44.8 33.9 0 R45 32.1 33.6 58.5 0 R46 40.1 31.1 29.0 35.7 0 R47 56.6 46.8 32.4 50.6 16.8 0 R49 60.4 56.9 51.9 90.5 76.1 84 0 R52 92.9 87.3 70.5 120.6 98.9 101.8 34.9 0 R55 26.6 33.7 59.9 57 64.7 80.2 43.2 84.9 0 R56 123.1 126.9 137.1 154.3 155.2 167.1 87.8 97.6 97.3 0 R57 127.4 131.1 140.4 158.7 159.1 170.7 90.6 98.9 101.7 4.85 0 R58 141.9 144.3 149.2 173.9 170.5 180.7 97.5 97.9 117.4 26.6 22.1 0 R59 140.1 141.8 144.8 172.2 167.1 176.7 92.9 91.1 116.3 31.4 27.6 8.6 0 R60 120.7 118.1 108.9 151.5 135.8 141.1 61.2 41.9 104.4 72.1 71.5 63.2 55.1 0 R61 129.9 130.0 128.7 162 152.9 161.0 77.1 70.2 108.4 44.2 42.2 30.4 22.2 32.91 0 R62 160.7 171.7 204.7 166.7 197.4 214.1 190.7 222.6 147.5 163.5 167.4 189 194.9 222.1 132.7 0 R64 202.7 213.6 245.2 212.6 241.1 258.0 223.6 252.8 185.9 178.2 181.1 200.9 208.1 244.8 111.4 51.9 0 53 R65 203.2 214.1 245.6 213.1 241.7 258.5 223.9 253.1 186.3 178.3 181.1 200.9 208.1 245.0 111.0 52.6 0.7 0 R66 181.3 191.8 221.4 195.8 221.0 237.8 194.7 222.0 161.5 143.1 158 165.1 172.5 211.2 80.8 58.5 36.2 36.2 0 R67 176.6 186.8 215.4 193.1 216.5 233.3 186.2 212.4 155.6 130.9 133.3 152.4 159.9 199.8 69.2 67.8 49.7 49.6 13.6 0 R68 153.1 163.0 190.7 171.4 193.2 209.7 160.6 186.8 131.1 107.7 110.5 130.6 137.6 175.3 71.7 71.18 70.5 70.6 35.9 25.6 0 R69 26.9 37.8 70.4 47.2 66.4 83.1 70.3 105.2 20.7 111.4 116.0 133.5 133.3 124.9 184.2 134.3 175.8 176.3 154.6 150.3 127.2 0

APPENDIX E

ENVIRONMETNAL DISTANCE MATRIX OF SAMPLING SITES: A COMPOSITE

SOIL DISTANCE MATRIX WAS GENERATED USING SQUARED EUCLIDIAN

DISNCE WITH A Z-SCORE VARIANCE CORRELATION IN SPSS (2007).

54

R40 R41 R43 R45 R46 R47 R49 R52 R55 R56 R57 R58 R59 R60 R61 R62 R64 R65 R66 R67 R68 R69 R40 0.0 R41 33.5 0.0 R43 53.9 34.6 0.0 R45 39.3 11.6 22.1 0.0 R46 34.9 2.6 30.2 8.7 0.0 R47 46.6 20.2 28.4 14.7 15.9 0.0 R49 34.3 14.8 29.6 2.8 10.8 13.2 0.0 R52 43.8 20.7 35.1 19.5 17.3 5.8 17.6 0.0 R55 48.5 24.9 14.8 20.1 19.5 24.9 26.8 18.8 0.0 R56 38.2 11.4 25.0 7.5 9.2 22.0 12.2 21.8 10.3 0.0 R57 38.6 9.2 20.4 5.1 4.4 8.4 7.1 9.5 10.7 6.2 0.0 R58 78.0 47.5 92.1 75.2 41.2 67.4 74.3 55.7 54.9 51.0 49.8 0.0 R59 41.1 7.9 39.9 18.5 5.3 25.1 22.0 20.9 20.0 10.9 10.6 22.5 0.0 R60 44.7 16.3 31.4 16.7 14.4 13.8 20.3 5.2 8.7 11.0 7.5 49.4 14.1 0.0 R61 41.4 10.0 35.9 18.5 6.5 18.1 22.0 18.0 19.8 13.3 11.2 32.5 3.9 14.1 0.0 R62 39.2 7.7 33.5 8.8 12.8 32.5 14.0 37.0 32.9 12.3 16.7 81.6 22.9 28.5 26.8 0.0 R64 35.5 7.7 24.0 1.7 5.0 9.1 2.9 12.3 19.3 9.3 3.3 66.3 14.5 12.5 12.8 10.9 0.0 R65 35.1 9.4 20.3 2.9 5.1 8.2 4.8 12.6 15.6 7.3 2.8 58.9 11.8 11.9 8.4 14.5 1.3 0.0

55 R66 30.9 3.1 28.5 5.7 5.4 12.5 7.8 12.8 20.9 8.5 6.1 58.2 11.2 10.6 11.9 7.0 3.2 5.0 0.0 R67 34.3 4.5 25.7 3.9 4.6 16.5 8.3 18.4 16.6 6.5 6 62.3 11.3 12 12.8 5.9 3.1 5 2.8 0 R68 25.3 9.9 26.1 9.1 8.6 18.1 10.6 13.8 9.5 2.9 5.5 49.9 11.8 6.9 13.9 14.2 8.0 7.6 6.3 5.8 0.0 R69 40.1 9.8 36.4 14.3 10.9 24.8 19.2 14.2 16.1 13.8 11.1 58.7 13.5 6.9 16.6 17.7 10.5 13.5 7.8 5.0 8.8 0.0

APPENDIX F

PHYLOGENETIC NETWORK DEPICTING SAMPLING SITES IN EACH GROUP

56

R68 R4569256780134 R69 R43 R46 R45

0.01 R67

1 R47

R47 .

1

7 .

R49 1 1 2

1 . .

8 9 8

Group3 1

. 2 6 6

2 8 7

R66 R528 R55

4 6

7 2

35 5

6 2

4 3 Group2 C

0 C 4

6 R57

K

X K

6 0

R58 2 R46

J

0

X R65 U

i U

J

X

z A

R59

i i

R62 R60 J BA R45

B h z

i R40

i C r R64 i

C C y h

n

r .

h C

a p

R61 y

c k s

. l

R41 h .

a

p R43 e

R69 R43 c B

p

s

. a

.

R63 .

p

B

.

B

R45 B B R66 R46 R41 R65 R47 B. hu R64 an gh Group4 ua B ih

R61 1 . a . B cy ien R60 1 . ja tis J 7 i X R49 po K 0 2 n C5 64

R59 4 B B i 2 . c 0 2 . . u 9 7 R69 6 hu C m 4 5 R40 0 C . 0 a B C 29 1 R68 4 n . R57 X g A P 1 0 h U 0 R52 J u 1

0 ai K 0 R67

e h C 3

R56 0 a 3 1 R65 s R55 0 ie 5 3.

n n 6 1 A 8 R63 R41

e K 4

B C 6 g 3 .

56 1 R62 in m 79 u 7 R43 m . R68 c 1

R46 n i R61

a n

u o y

p R45

. a R60 j Group1

R67 B

.

B R55 R59 Group5 R46 R66 R59 R45 R58 R59 R49 R56 R52 R62 Group5 R65

R58 B 2.1 . ja 270 po 96 1 nic n KF 8. um haie 00 C uai 46 P0 ngh U1 07 hua E 56 B. se 9. en 1 ing on lia B.

B. sp. CMVU30 KC247138.1 B. elkanii EU418414.1 B. sp. TUXTLAS JF266695.1 B. elkanii KF859889.1 B. sp. CF1 JF821007.1 B. elkanii JQ810094.1 B. elkanii EU622080.1 B. sp. CMW2 JN993734.1 B. sp. TUXTLAS JF266683.1 B. sp. lppb2 JF821027.1 B. japonicum HM107280.1 B. sp. CCBAU HM107281.1 B. arachidis JQ011358.1 B. japonicum EF512283.1 B. sp. SEMIA HQ259529.1 B. japonicum HM107280.1 B. sp. LcCT4JF821022.1 B. sp. CFRR1 JF821044.1 Figure F.1 Sampling sites represented by RXX

Sampling site codes follow those indicated in Appendix A.

57