<<

Genome size in the UK flora and how this relates to their nitrogen and water tolerance

Lauren Pollitt

2019

Dissertation submitted for the degree of Master of Science in and Fungal , Diversity and Conservation awarded by Queen Mary, University of London. https://doi.org/10.34885/79

© The Author. All rights reserved. 6

i. Abstract Genome size (GS) is defined as the total DNA amount in the unreplicated nucleus of an organism, often used synonymously with ‘1C value’. GS varies greatly in flowering . Nitrogen (N) and water are considered major limiting factors for plant growth, and their availability could correlate with plants’ GS. Nitrogen is one of the main building blocks of DNA; plants with larger genomes may have elevated N requirements. Water is lost through pores in the epidermal leaf surface area called stomata, formed by two guard cells. GS has been shown to positively correlate with guard cell size. Previous studies have demonstrated that smaller guard cells result in reduced water-loss. This study considers whether large genome Fabaceae species are more likely to be excluded from dry, N-limited environments, and whether there is an interaction between N and water availability which affects GS.

Ellenberg values were used as a proxy for water and N in habitats. GS of UK Fabaceae flora were obtained from the Royal Botanic Gardens, Kew 2C-value database and flow cytometry measurements. Guard cell measurements were obtained from stomatal peels. Statistical analysis in R revealed statistically significant correlations between genome size, water, nitrogen and their interaction, with one interesting outlier species. Phylogenetic trees, ancestral reconstruction and a phylogenetic least squares model revealed a strong phylogenetic signal in the data.

This study demonstrates that N and water availability often has an effect on GS in UK Fabaceae. Subsequent studies should include mycorrhizal association and rhizobia information in relation to sourcing nitrogen and water, and how this affects GS.

Key words: DNA - Ellenberg – environment – Fabaceae – flow cytometry - guard cells – phylogenetic tree – stomata

7

ii. Contents i. Abstract……………………………………………………………………………………………………………………….…….6 1. Acknowledgements…………………………………………………………………………………………….……8 2. Introduction………………………………………………………………………………………………………….….9 2.1 Genome size in plants………………………………………………………………………………………………….….9 2.2 UK Fabaceae…………………………………………………………………………………………………………………...9 2.3 The effect of moisture availability on genomes size in plants………………………………………….9-10 2.4 The effect of nitrogen availability on genome size in Fabaceae……………………………………….10-11 2.5 Ellenberg values………………………………………………………………………………………………………………11 2.6 Aims of study…………………………………………………………………………………………………………………..11 3. Materials and Methods……………………………………………………………………………………..….12 3.1 Ellenberg values and database………………………………………………………………………………………..12 3.2 Flow cytometry………………………………………………………………………………………………………………12-13 3.3 Stomatal measurements………………………………………………………………………………………………….13 3.4 Statistical analysis in R………………………………………………………………………………………………….…13 3.5 Phylogenetic reconstruction in R………………………………………………………………………………….…13 4. Results……………………………………………………………………………………………………………….…14 - 15 4.1 The effect of moisture on GS in UK Fabaceae flora; indicated by EF and GCS…………………..14 4.2 The effect of nitrogen, measured by EN, on GS in UK Fabaceae flora……………………………….14 4.3 Interaction between EN and EF on GS…………………...... 15 4.4 Accounting for phylogenetic nonindependence…………………………………………………………………15 4.5 Phylogenetic ancestral reconstruction………………………………………………………………………………..15 5. Discussion………………………………………………………………………………………………………………16 - 18 5.1 Overall genome size trends in UK Fabaceae……..……..…………………………………………………….….16 5.2 The effects of Nitrogen on GS………………………………………………………………………………………….…16 5.3 The effects of moisture on GS………………………………………………………………………………………….…16 5.4 Outlier Lathyrus sylvestris…………….………………………………………………………………………………….…17 5.5 Evaluation of the study…………………………..………………………………………………………………………...17 5.6 Conclusions…………………………………………………………………………………………………………………….….18 6. Tables and Figures………………………………………………………………………………………………………………19 - 29 6.1 Tables…………………………………………………………………………………………………………………………….…19 – 20 6.2 Figures……………………………………………………………………………………………………………………………..21- 29 7. References…………………………………………………………………………………………………………….30 - 33 8. Appendices and supplementary material………………………………………………………… …..34 - 55

8

1. Acknowledgements I would like to thank my supervisor Professor Andrew Leitch for his time, support and dedication throughout my research project. I thank Dr Ilia Leitch for her guiding wisdom, Dr Robyn Powell for her flow cytometry measurements and expertise and Marie Henniges for her reliable help and insightful contribution throughout.

9

2. Introduction

2.1 Genome size in plants The first genome to be measured in 1951 by Ogur et al. was Lilium longiflorum cv. Croft (Bennet and Leitch, 2005; Greilhuber, 2013). Since then over 8,500 species of plants have been measured, including 3,500 angiosperms, which comprises 1% of angiosperm species, and 485 of angiosperm families (Soltis et al, 2003; Leitch et al., 1998; Pellicer & Leitch, 2014). A staggering 2,400-fold variation in genome size exists in angiosperms (Dodsworth et al., 2015; Guignard et al., 2017; Pellicer et al, 2010; Soltis et al, 2003). Whole-genome-duplication, also called polyploidisation, can cause an increase in genome size (Wendel, 2015). All angiosperms are considered to be paleo-polyploids, having undergone at least one round of polyploidisation in their evolutionary history (Wendel, 2015; Dodsworth, 2016; Guignard et al, 2016). An extreme example is the historically 288-ploid Brassica genus (Wendel, 2015). GS can increase by repeated-sequence DNA or transposable-element proliferation (Dodsworth et al, 2015; Hawkins et al, 2008). A decrease in genome size can be caused by diploidization, a process involving genome-downsizing, which can occur over millennia (Wendel, 2015). Additionally, intrastrand homologous recombination, illegitimate recombination and non- homologous end joining can also drive genome size down (Dodsworth et al., 2015; Hawkins et al, 2008). Genome size and ploidy level are not necessarily linked due to the downsizing process (Soltis, 2003; Pellicer and Leitch, 2014). Considering that the ancestral angiosperm genome was estimated to be <1.4 pg, and the majority of extant angiosperms have small genomes of around 0.2 – 3.5 pg, there appears to be a large-scale genome downsizing trend across angiosperms (Bennett & Leitch, 2005; Dodsworth et al., 2015; Pellicer et al., 2014). GS has been shown to be predictive for seed mass, pollen size, GC content of DNA, and shoot phenology, UV radiation sensitivity, time to complete cell cycles and cell size (Bennet, Beaulieu et al., 2008; Vesely et al., 2011; ) This study refers to the ‘1C-value’: DNA amount in one chromosomal set (monoploid) chromosome number n, of the non-replicated gamete nucleus (Greilhuber et al., 2005). This is irrespective of ploidy level and measured in picograms or megabase pairs (Pellicer and Leitch, 2014; Greilhuber et al., 2005; Bennet and Leitch, 2005). The 2C-value is the amount of DNA in a replicated zygotic, diplophasic cell, chromosome number 2n (Greilhuber et al., 2005).

2.2 UK Fabaceae Fabaceae has the third-highest number of species of angiosperm families (LPWG, 2017). Legumes, beans and pulses are economically important crops (LPWG, 2017). The total world exports of pulses have more than doubled from 1990 to 2012, when the value of pulse exports was reported at 9.5 USD. The human diet has included Fabaceae species since the beginning of agriculture, 10,000 years ago (FAO, 2019). The United Nations marked 2016 as the ‘The International Year of Pulses’ which raised awareness about the contribution of Fabaceae crop species in food security (FAO, 2019). This is especially important as the human population continues to grow exponentially and climate change continues to threaten crop production (IPPC, 2014). The UK is a great environment for studying Fabaceae as there are good records of the plant species and the limited number of diverse Fabaceae species make for an excellent case study.

2.3 The effect of moisture availability on genome size in plants

Stomata are microscopic pores on the plant leaf surface epidermis, which regulate water vapour loss (transpiration), and CO2 uptake for photosynthesis (Franks and Beerling, 2009). It is widely 10

considered that stomata’s origin 400 mya allowed plants to emerge from the water and colonise land (Franks and Beerling, 2009). Stomata comprise two guard cells which respond to environmental stimuli by opening or closing an aperture, thus adjusting the flow of CO2 and water (Franks and Beerling, 2009). The turgor pressure in guard cells decreases when water is lower, meaning the aperture size reduces (Bertolino et al., 2019). Water use efficiency is higher when stomata are smaller as the stimulus- response time is shorter, meaning guard cells move faster (Bertolino, 2019). In scarce water conditions the stomata react quickly thus conserving as much water as possible (Bertolino, 2019). Maximal stomatal conductance (gmax) is determined by stomatal size and density (Bertolino, 2019). Genome size and guard cell size strongly correlate, and genome size and stomatal density are inversely correlated (Beaulieu et al., 2008; Bertolino et al., 2019; Franks and Beerling, 2009; Jordan et al, 2014). Plants with smaller genomes and smaller stomata may be better adapted to drier conditions (Bertolino et al., 2019). Masterson (1994) used the correlation between guard cell size and GS to estimate GS of ancestral plant species using the fossil record (Franks and Beerling, 2009; Masterson, 1994).

The predicted changes in climate and the warming of the planet will strain agricultural systems (Bertolino et al., 2019). Crop production will be negatively impacted by increased drought frequency (Bertolino et al., 2019). Knowledge of GS distribution across the evolutionary tree may help selection of crop wild relatives for breeding into crop species. Additionally, recent research points towards the potential manipulation of stomata to increase tolerance to aridity (Bertolino et al., 2019). It is predicted that, at high water levels, there will be correlations between nitrate availability and GS; when water is low, nitrogen content may not be limiting.

2.4 The effect of nitrogen availability on genome size in Fabaceae

Nitrogen (N) comprises 78% of the Earth’s atmosphere in the form of nitrogen gas (N2), however this is unusable by plants until reduced or fixed (Wagner, 2011). Nitrogen is found at concentrations of 1 mM to 0.1 mM in soil (Guignard et al, 2016). It is an essential, limiting element for plant production and development (Wagner, 2011). It is a major component of chlorophyll, a fundamental pigment required for photosynthesis, and a key component of amino acids (Wagner, 2011). Nitrogen is found in essential molecules of ATP, and comprises 39% of the mass of DNA’s nucleic acids (Greilhuber and Leitch, 2013; Guignard et al, 2016). The acquisition of reduced nitrogen occurs from: 1) decomposition of organic matter, 2) nitrate and ammonia (NH3) fertiliser derived from the Haber- Bosch process, 3) lightning, 4) biological nitrogen fixation (Wagner, 2011). Biological nitrogen fixation was discovered by Beijerinck in 1901 (Wagner, 2011). Arguably the most important nitrogen- fixing symbiotic associations are between Fabaceae hosts and Rhizobium or Bradyrhizobium bacteria (Hirsch, 2001; Wagner, 2011). A host plant colonised by Rhizobium or Bradyrhizobium forms nodules in its roots, which house the bacteria. The majority of the UK Fabaceae are in the subfamily Papilionoideae and around 90% of Papilionoideae nodulate (Hirsch, 2001). Rhizobium or bradyrhizobium use nitrogenase to catalyze conversion of N2 to NH3, which can be assimilated by plants into nucleic acids. Bacteria receive sugars from photosynthesis, which are used as energy to fix nitrogen (Wagner, 2011). Specificity of the bacteria-host interaction is often uniquely intricate, for example, Rhizobium leguminosarum biovar trifolii will only nodulate Trifolium (Wagner, 2011). Nodulation may have evolved from mycorrhizal association (Hirsch, 2001). According to the fossil record, mycorrhizal associations originated 400 Mya, the symbiotic pathway used in mycorrhizal association could have been adapted to nodulation symbiosis. Early nodulin genes are expressed in both nodulating and mycorrhizal-forming associations (Hirsch, 2001; Brundrett, 2002). Arbuscular mycorrhizal associations (AM) are found in 72% of angiosperms and ectomycorrhizal associations associations (EcM) are found in 2% of angiosperms (Hirsch, 2001). Fabaceae have AM and EcM (Hirsch, 2001). The hypothesis is species that mine nitrogen through rhizobia and mycorrhizae may have larger genomes. These Fabaceae may be better equipped to persist in low-nutrient 11

environments, where genome sizes are generally lower, perhaps leading to larger than predicted Fabaceae genomes in nutrient-poor areas.

Low nitrogen levels in the soil, or inability to obtain nitrogen, may be a selection pressure against those plant species with larger genomes (Guignard et al, 2016). Plants with smaller genomes may be less restricted by nitrogen availability. In community ecology observations, higher mean genome sizes were associated with nitrogen and phosphorus experimental fertilised plots, compared to smaller genome plants found in nitrogen-phosphorus limited plots. This leads to a possibility that there is selection for smaller genomes over evolutionary time (Guignard et al, 2016).

2.5 Ellenberg values

Ellenberg values describe a plant’s optimum niche, including categories: light, pH, salt, moisture and nitrogen or general soil fertility (Hill et al, 1999). Ellenberg values are not measurements; experts have assigned these floristic values. Many of these categories may correlate in some way with genome size, although this study focuses on nitrogen and water values, Ellenberg values N and F, respectively. Ellenberg values do not have separate scales for phosphorus or other nutrients, thus it is assumed that the N-value also indicates a higher general nutrient level including phosphorus, which is a major part of DNA. Nitrogen is represented by ‘N’ and uses a range of discrete categorical integer values from one to nine, specifically for the UK flora (Hill et al, 1999). An Ellenberg N value (EN) of “1” indicates extremely infertile land, “5” indicates intermediately fertile soil, “7” indicates plants found in richly fertile places, and “9” indicates extremely rich situations near pollution and cattle fields (Hill et al, 1999). Ellenberg F value (EF) refers to the water content of soil, and ranges from one to twelve (Hill et al, 1999). EF ‘1’ is an indicator of extreme dryness, and ‘12’ indicates a submerged plant that is constantly under water (Hill et al, 1999).

2.6 Aims of study

This study aims to determine whether the nitrogen and water content of soil where UK Fabaceae species preferentially grow is correlated with their genome size. This study is situated within the context of exploring a wider phenomenon; an attempt to uncover which factors impact the genome size of plants, particularly whether there is causality in the correlations and relationships that are found.

12

3. Materials and Methods

3.1 Ellenberg values and database

Fabaceae genome sizes from the C-value database (Kew and Wakehurst, release 8.0, 2012), and Ellenberg data were compiled into a database, edited by Dr Ilia Leitch. There were 75 UK Fabaceae species in the database, however there were genome size data omissions. This information was sort using the methods detailed below, the final analysis was carried out on 56 species.

3.2 Flow cytometry

Flow cytometry was used to measure the GS of fourteen species for which reliable measurements were not yet available in the C-value database (Table 3). Seeds were ordered form the Millennium Seed Bank (MSB) (Table 1) and the key to Pea family (Rose, 2006) was used to identify Ulex minor, located in Hainault Forest (Figure 1) for sample collection. Seeds were soaked for 48 hours in water to soften the shell due to extreme desiccation in the seed bank.

Method 1. A total of 5-7 seeds of each target species along with the reference standard leaf tissue, each around 1 cm2, were placed into a 6 cm petri dish. Added to this was 1 mL of one of the isolation buffers; General Purpose Buffer (GPB), or Ebihara buffer, stored on ice (Pellicer, 2014). 2. The presence of standards: Petroselinium crispum (parsley), Solanum lycopersicum (tomato) or Pisum sativum (pea) (See Table 2 for genome sizes), enabled subsequent calculation of the relative size of the target genome. 3. A razor blade was used to chop the sample into smaller pieces of around 0.5 mm2. A second 1 mL of GPB was added before filtering the suspension through 30 μl mesh filter “CellTrics” into a labelled flow-cytometry tube. 100 μl of the fluorochrome propidium iodide was added to the nuclei suspension, the mixture was then vortexed. 4. The sample was incubated for a minute on ice and then inserted into a Partec Cyflow SL3 flow cytometer (Partec GmbH, Münster, Germany) fitted with a 100 mW green solid-state laser (532 nm, Cobolt Samba, Solna, Sweden). This measured the relative fluorescence of 5000 particles. Samples, dye and buffers were kept on ice, and chopping of samples was conducted over ice. This was in order to inhibit the degradation of DNA by DNase (Pellicer, 2014). 5. Three technical replicates of each seed were processed. The output histograms were analysed with FlowMax software, version 2.4 (Partec GmbH). The flow cytometer passes the stained nuclei in single-file through a capillary tube, where the laser is projected onto the nuclei. The photons scattered from each nucleus, known as the “forward scatter”, are captured by a detector in the machine and converted into a voltage. The higher the voltage, the more DNA is in the cell (Sysmex, n.d.). 6. The FloMax software generates histograms, whereby the position of the peaks relates to genome size (Figure 3). Flow cytometry histograms usually show one 1C-DNA, and one 2C- DNA peak for the target and reference standard. FloMax “Peak analysis” function provides the user with descriptive statistics such as the mean value for each peak, as well as the confidence interval, coefficient of variation “CV%”. A CV% of publication quality was aimed for which is around 3% (Pellicer, 2014). The equation used by the flow cytometer machine is: CV% = (Standard deviation of peak/ mean channel position of the peak) * 100.

Calculations for the 2C DNA genome size of a target species: 13

Target sample mean 1C peak x 2C DNA content standard (pg) Standard sample mean 1C peak

3.3 Stomatal measurements

In order to obtain measurements of the guard cells, clear nail varnish was applied to either side of midrib on the adaxial surface of fully expanded leaves (Beaulieu et al., 2008). The region either side of the midrib has guard cell lengths (GCL) representative of the leaf mean GCL (Bertolino et al., 2019). Following a half hour drying period, the epidermal impression on nail varnish was peeled off and mounted onto a microscope slide in glycerol. A Leica DMLB compound microscope fitted with CMOS Leica DMC5400 camera using the software ‘Leica Application Suite version 4.13.0’ was utilised for measurements at 20x magnification. Measurements were repeated so that data was obtained from three plants and three different leaves per species. Ten GCL measurements were taken per sample.

3.4 Statistical analysis in R

Following initial visualisation of the spread of data and trends in scatter plots, Pearson’s product moment correlation test and linear models were performed on the data. Linear models were repeated with log-transformed data to alleviate skew. All statistical tests were carried out using the coding language ‘R’ in ‘RStudio’ (R core Team, 2019; RStudio Team, 2015).

3.5 Phylogenetic construction in R

An angiosperm phylogenetic tree (Smith and Brown, 2018) was pruned in multiple steps to Fabaceae, down to UK Fabaceae, finally down to just the UK Fabaceae species in the database. The R packages used were ‘ape’ (Paradis & Schliep, 2018) and ‘phangorn’ (Schliep, 2011; Schliep et al., 2017). The package used to create phylogenetic trees was ‘phytools’ (Revell, 2012). The function ‘ContMap’ in package ‘phytools’ was used to map continuous characters onto the phylogenetic tree which generated a colour scale to represent the GS, EN and EF values. The function ‘FastAnc’ in package ‘phytools’ estimated the maximum likelihood ancestral states for GS, EN and EF (Figure 5-7). Phylogenetic correction was then applied to account for non-independence in the data. A Phylogenetic Generalised Least Squares (PGLS) model was applied with the package ‘caper’ (Orme et al., 2018). This is a type of linear model in which the covariance between species is compared to that expected under a Brownian motion process of evolution (Münkemüller et al., 2012). Under the assumption of a Brownian model of evolution, only the phylogenetic relationships of species define the expected covariance of their traits (Münkemüller et al., 2012). The inclusion of other factors which effect trait evolution means the phylogenetic influence needs to be down-weighted (Münkemüller et al., 2012). The coefficient Pegal’s lambda value returned in R, indicates the degree of phylogenetic signal in the data (Münkemüller et al., 2012). Pagel’s lambda varies from 0 to 1, where 0 indicates phylogenetic independence and 1 indicates that species’ traits covariation is directly proportional to their shared evolutionary history (Freckleton et al., 2002). Pagel’s lambda was estimated using package ‘caper’, which used maximum likelihood. 14

4. Results

The dataset of UK Fabaceae flora (see Appendices) had a 34-fold range in GS. The smallest genome size in the data was 1C = 0.31 pg, found in species Lotus angustissimus. The largest GS in the data was 1C = 10.5 pg, found in species Lathyrus sylvestris. This could be considered to be an outlier, since the next highest genome size was 8.05 pg in sylvatica. Within the context of this dataset a GS of 1C = 0 - 2 was defined as very small, 1C = 2 - 4 pg was defined as a small GS, 1C = 4 - 6 pg was defined as a medium GS, 1C > 6 pg was defined as a large genome and 1C > 8 pg was defined as very large GS. The majority of the species: 62%, had very small genomes, 17% were small, 9% medium, 8% large and 4% very large (Figure 4).

4.1 The effect of moisture on GS in UK Fabaceae flora; indicated by EF and GCS

The majority of the species (31.57%), have EF ‘4’, and 26.3% of the species have an EF ‘5’. The greatest range of GS is in the EF ‘4’ category, closely followed by EF ‘5’ (see Figure 8). An EF ‘3’ relates to 22.8% of the data and 14% of the data have EF ‘6’. GS in these categories are smaller than in ‘4’ and ‘5’ (see Figure 8). The higher EF ‘7’ and ‘8’, represent 5.25% of the data between them, and only related to species with very small genomes (Figure 8). Species with very large genomes either had EF ‘4’ or ‘5’. Ellenberg ‘5’ is a moist indicator, species prefer fresh soil with average dampness (Ellenberg, 1999). Ellenberg ‘4’ is described as being between 3 and 5, thus it is slightly drier than ‘5’ (Ellenberg, 1999). Species with EF ‘3’, had very small through to large genomes, but never very large. The scatter plot (Figure 8) reveals a slight negative correlation however the Pearson’s product- moment correlation test on EF and GS produced a correlation coefficient of 0.09. This value is close to zero, which indicates an absence of a linear relationship, however this is not significant p = 0.5. A linear model on the effect of EF on GS was not significant. The residual plot was skewed with three outliers, (one severe outlier, L. sylvestris). The skew in the data may have underrepresented the correlation. The linear model analysing the effect of log-transformed EF on GS was significant, p = 0.002. The R-squared value was 0.95, this value is close to 1 suggesting that there is a very tight correlation between EF and GS.

A Pearson’s product-moment correlation test on the relationship between GS and GCL (Table 4), returned a correlation r value 0.98, p < 0.05. This is almost a perfect positive linear relationship.

4.2 The effect of nitrogen, measured by EN, on GS in UK Fabaceae flora

The dataset EN ranged from ‘1’ indicating plants growing on extremely infertile soil, to ‘6’ indicating plants that are found on intermediately-fertile to richly-fertile places (Ellenberg, 1999). The Pearson’s product-moment correlation test on EN and GS produced a correlation coefficient of 0.3, signifying a weak positive relationship, p < 0.05. Most large and very large genomes were found to have EN of ‘4’ and ‘5’ (Figure 8). Although, the average GS for ‘4’ was 1C = 2.75, and for ‘5’ was 1C = 2.71 pg. The highest EN ‘6’, had a larger average GS, 1C = 3.8 pg. Conversely, the outlier with the largest GS Lathyrus sylvestris (1C = 10.5 pg), has an EN of ‘2’, this was also the mode EN, shared by 37.5 % of the species. Lathyrus sylvestris has a low EN, despite having a high GS, whereas most Vicia species have a similar N value (Figure 6). The average GS for EN ‘2’ was 1C = 1.95 pg, (including L. sylvestris in the calculation).

The linear model on the effect of EN on GS was not significant. The residual plot was skewed and there were three outliers, one of which was an extreme outlier (L. sylvestris). The skew in the data may have underrepresented the correlation. The effect of log-transformed EN values on GS was significant, p = 15

0.003, the R- squared value was 0.85, this is close to 1 suggesting a tight correlation between EN and GS.

4.3 Interaction between EN and EF on GS

A linear model on the effect of the interaction between EF and EN was significant, p < 0.005.

4.4 Accounting for phylogenetic nonindependance

The Phylogenetic Generalised Least Squares (PGLS) model for the effect of EF on GS, EN on GS and the mixed effects or interaction of EF and EN, were all significant, p < 0.001. All PGLS models had a Pagel’s lambda value = 1, using the maximum likelihood method, this indicates a very strong phylogenetic signal.

4.5 Phylogenetic ancestral reconstruction

Beginning at the top of the phylogenetic tree and working towards the bottom (Figures 6 and 7), some notable ancestral reconstruction values are as follows. The most recent common ancestor (MRCA) of the Ulex genus had a GS of 3.03 pg and an EN of 2.28, EF of 5.67. A similar ratio of values was found in the MRCA of the Genista genus: GS of 1.58 pg, EN of 1.83 and EF of 5.47. The small genomes of the Lotus genus had a MRCA GS of 0.64 pg. The MRCA of L. pedunculatus and L. subbiflorus had a higher EN of 4.5, whereas the monophyletic MRCA of the Lotus genus had an EN of 2.98 and an EF of 4.36. GS was quite high for the MRCA of the Vicia-Lathyrus cluster: 4.89 pg, EN was 4.71 and EF was 4.81. L. sylvestris was an exception with a very large GS and low EN (see 4.2). Trifolium dubium and sister taxon T. campestre’s MRCA GS is 0.56 pg in a nutrient dense 4.49 EN, and moist 5.79 EF. The MRCA of the majority of the Trifoilum genus has a GS of 0.8 pg, EN of 2.75, and EF of 3.95. There is a slight increase in GS with a concurrent increase of EN toward the bottom of the phylogenetic tree, the MRCA of T. ornithopodioides and T. fragiferum has GS of 0.76 in a high EN of 4.16, EF of 4.

16

5. Discussion

5.1 Overall genome size trends in UK Fabaceae

The vast majority of UK Fabaceae species in this study had very small or small genomes, concurrent with the notion that it is advantageous for plants to have smaller genomes. This raises questions such as whether there is an upper limit to GS and what problems plants with larger genomes face. Highly endangered plant species on the International Union for Conservation of Nature’s (IUCN) Red List were found to contain a large proportion of species with large genomes (Vinogradov, 2003). The ‘Large genome constraint hypothesis’ coined by Knight et al, 2005, showed that genera with large genomes are less speciose, and are excluded from extreme environments at either end of the scale. This pattern can be seen in the data, in EN category ‘5’ there are ‘very large’ genomes, whereas in category ‘6’ there are no ‘very large’ genomes, if the outlier L. sylvestris (see section called ‘Outlier Lathyrus sylvestris’) is ignored, this pattern holds at the lower end of the EN scale too. This reveals a bell-shaped distribution of GS in relation to their EN value. Likewise, at both ends of the EF scale, the GS decreases, with the higher Ellenberg F values for categories ‘7’ and ‘8’ only relating to plants with ‘very small’ genomes (Figure 8a). There is an increase in GS with moisture until a threshold around Ellenberg ‘4’ or ‘5’; after this has been surpassed, genomes do not continue to increase. This supports the idea that there may be an upper limit to GS. Perhaps there is a constraint on cell size and associated parameters such as cell cycle duration (Beaulieu et al., 2008). Cell cycle duration ranges from hours to weeks which is related to GS amongst other contributing factors (Bennet, 1971). This in turn leads to further layers of complexity such as the life form or life strategy of the plant (Beaulieu et al., 2008). Ruderals or weeds are generally considered to have small genomes due to the need to reproduce quickly, similarly stress-tolerant plants tend to have smaller genomes, whereas competitors may have larger genomes (Bennett et al., 1998).

5.2 The effects of Nitrogen on GS Plants with larger genomes may be excluded from some ecosystems which are low in N and P (Guignard, 2017), while plants with smaller genomes may be more flexible in their choice of environment. The Park Grass Experiment site in Rothamsted UK, has undergone various fertilisation treatments of nitrogen and phosphorus together and separately in different plots since 1856 (Guignard et al, 2016). Plots fertilised with N and P combined, favoured plants with larger genomes, in terms of species composition and production of biomass (Guignard, 2016). The overall diversity was lower in plots where larger genome plants dominated (Guignard, 2016). The significant effect of EN observed in this study on GS could be due to independence from environmental nutrient conditions in small genome plants and restriction based on environmental nutrient levels seen in larger genome plants (Guignard et al, 2016). In this way GS could be a predictive value for plant distributions and responses to climate change or anthropological destruction of habitats which alter nutrient availability.

5.3 The effects of moisture on GS

Positive correlations have been found between water availability or annual mean precipitation and GS (Knight et al., 2005). The annual groundwater level has been found to correlate well with EF (Schaffers, 2000). The Trifolium species had a GS of around half that of the Vicia species; similarly, GCS in the Vicia was at least double that of the Trifolium species. Whilst there are insufficient numbers of species in Vicia to draw conclusions, this trend supports current evidence and thinking that genome size correlates with GCS (Beaulieu, 2018). Although GCS has been shown to have an inverse relationship to stomatal density, the increase of cell size maintains increased transpiration 17

rates overall (Beaulieu, 2018). In this case the data was insufficient to report on density, although this information should be included in subsequent studies. Leaves with smaller stomata have higher rates of gas exchange and faster dynamic characteristics (Drake et al, 2013). This means that stomata reaction times are quicker and the water potential gradient is larger when stomata are smaller (Drake et al, 2013). Smaller stomata close faster which prevents water loss in dry environments, this could explain the significant effect of EF on GS, via the proxy of stomatal sizes (Drake et al, 2013). The significant effect of the interaction between water and nitrogen on genome size could possibly be because water is the primary pressure on GS and only once this minimal requirement is met can nutrient levels make a difference.

The strong phylogenetic signal found in the PGLS could mean that a lot of the values of EF and EN, and GS are more likely to be a result of their phylogenetic relationship rather than a strong indication of evolutionary advantage.

5.4 Outlier Lathyrus sylvestris

The Lathyrus and Vicia genera form a monophyletic clade (Figures 5-7), which may represent an independent expansion of GS in the evolutionary history of the UK Fabaceae species. Lathyrus sylvestris is an extreme outlier within this clade; it has the largest genome in the dataset despite having a low EN of ‘2’, and a mid-range EF ‘4’. A potential cause of this larger than expected genome in a nutrient-poor area, could be supplementary nitrogen mining by rhizobia and/or mycorrhizal associations. Consequently, a factor to incorporate into future GS research is information on mycorrhizal associations. Arbuscular and ectomycorrhizal mycorrhizal associations source different nutrients in different quantities, which could impact GS in associated species, although no concrete studies have been done on this yet. The number of nodules can be affected by a variety of mutations and different genera have different types of nodules, which affects the amount of nitrogen the host plant obtains (Stougaard, 2011; Wagner, 2011). It could be that L. sylvestris forms symbioses which allowed recruitment of enough nitrogen to build such a large genome, despite the generally poor conditions it grows in. Further study should identify if Fabaceae-rhizobia symbioses correlate in any way with genome size. It could be hypothesised that species that obtain a large amount of nitrogen from rhizobia have larger genomes disproportionate to their environment, and species that lack this extra nitrogen-recruitment are forced to have smaller genomes. Nitrogen-fixing Fabaceae may be better equipped to persist in lower-nutrient environments where genome sizes are generally lower, meaning their genome can become disproportionately large.

5.5 Evaluation of the study

The strengths of this study lie in its production of evidence that further supports established theory around genome size, in relation to water and nitrogen. The limitations of this study are multiple; specifically the incompleteness of the dataset due to inaccessibility of plant material or data for some species. Fabaceae species were only selected for analysis if all three data values were present; GS, EF and EN, thus not the entire UK Fabaceae flora is represented. Secondly, the UK’s temperate climate means the Ellenberg values touched by UK Fabaceae are within a narrow range, thus the more extreme ends of the nitrogen and water scales were not represented here. A broader study for example of African Fabaceae would likely incorporate species from a wider range of habitats, climates and Ellenberg values, thus unveiling what happens to GS at the tail ends of the Ellenberg scales.

18

The efficacy of Ellenberg indicator scores has been contended, with some research finding only weak correlations between N availability and Ellenberg N-values, despite an increase in biomass production in areas with high Ellenberg N-values (Schaffers, 2000). It could be postulated that quantifying the nitrogen and water content of species’ niches would be best defined by measurement of physio-chemical properties of the environment (Jarvis et el., 2016). On the contrary, measurements can fail to capture true heterogeneity at a site, in such cases indicator scores are more representative (Jarvis et al., 2016). The use of indicator scores also circumvents error and variation in techniques associated with measurements (Jarvis et al., 2016).

Finally, this study focussed on Fabaceae, a study on all angiosperm families in the UK is needed; Poaceae and Asteraceae families should be prioritised due to their importance as crops.

5.6 Conclusions

This study tentatively supports that correlations of GS with nitrogen and water availability in UK Fabaceae are positive until a threshold is reached, at which point the correlation becomes negative. It is important to note that N and water are not the only factors implicating GS, which is why there are outliers to this trend.

19

6. Tables and Figures 6.1 Tables

Table 1. Table of the seeds requested for this study, from the Millennium Seed Bank (MSB), Kew Gardens, using the online MSB Seed List. Country of origin Reference number Taxon name Supplied

UK 177032 Astragalus alpinus Yes

Italy 695253 Astragalus danicus Yes

Belgium 175197 No

France 970923 Gentiana pneumonanthe Yes

Switzerland 52685 Gentiana verna Yes

UK 130408 Glaux maritima Yes

UK 643069 No

UK 9346 Ornithopus perpusillus No

Greece 75909 Ornithopus pinnatus Yes

USA 550321 Oxytropis campestris Yes

Greece 31804 Trifolium bocconei Yes

Italy 31549 Trifolium ochroleucon Yes

UK 7559 Trifolium ornithopodioides Yes

France 55479 Trifolium suffocatum Yes

UK 170206 Ulex minor No

Greece 11282 Vicia parviflora Yes

Greece 11684 Vicia sativa subsp. nigra Yes

Table 2. Table of the genome sizes of reference standard species used in flow cytometry.

Reference standards Genome size 1C value in picograms (pg) Solanum lycopersicum (tomato) 0.98 pg Petroselinum crispum (parsley) 2.22 pg Pisum sativum (pea) 4.86 pg

Table 3. Table of flow cytometry measurements with calculations of genome size, including confidence intervals, for target sample species and the reference standard used.

MSB# Species name Standard Av Av GS 2C GS 1C Cv Cv SD(R) sample standard sample standard sample

N/A Ulex minor Petro 341.133 226.846 6.77 3.38 3.51 4.04 0.021 3 7 20

7559 Trifolium 559.626 1.7608 0.8804 4.05333 0.0428524 ornithopodioid Petro 218.98 2.206667 7 35 17 3 6 es 31549 Trifolium 164.373 1.2002 0.6001 4.82333 0.1042417 ochroleucon Petro 616.29 1.736667 3 14 07 3 06

31804 Trifolium 230.643 1.8090 0.9045 4.55333 0.0486757 Petro 573.71 2.21 bocconei 3 93 47 3 98

1.1426 0.5713 4.74666 0.0454312 Petro 184.28 725.71 1.583333 55479 Trifolium 88 44 7 67 suffocatum 422.463 9.1332 4.5666 0.0066583 Petro 208.15 2.88 3.516667 55032 Oxytropis 3 45 23 28 1 campestris 351.413 285.846 5.5321 2.7660 1.98333 0.0092376 Petro 3.95 52685 Gentiana verna 3 7 97 98 3 04

Gentiana 692.093 196.956 15.812 7.9063 2.18666 0.0034641 Petro 2.183333 97092 pneumonanthe 3 7 72 58 7 02 3

952.716 1.2804 0.6402 5.04333 0.1051966 Pisum 134.2 1.19 69525 Astragalus 7 21 1 3 41 3 danicus

194.636 3.9830 1.9915 4.25666 0.0144337 Pisum 444.19 1.9 17703 Astragalus 7 87 43 7 57 2 alpinus

1.4202 0.7101 4.85333 0.0200333 Pisum 145.2 929.33 1.25 75909 Ornithopus 36 18 3 06 pinnatus

234.996 3.1583 1.5791 2.11333 0.0087177 13040 Pisum 676.34 1.67 Glaux maritima 7 52 76 3 98 8

11282 252.203 404.266 5.6708 2.8354 0.0089628 Vicia parviflora Pisum 3.73 1.996667 3 7 32 16 86

197.376 4.4016 2.2008 0.0418608 Vicia sativa Pisum 407.61 5.09 2.18 11684 7 43 22 81

Stomatal measurements

Table 4. Table of GCL, measurements obtained from Fabaceae species, with GS.

Species name Average stomatal guard cell length (μm) Genome size 1C (pg)

Trifolium ornithopodioides 15.92 0.88

Trifolium bocconei 13.17 0.9

Trifolium ochroleucon 14.01 0.6

Vicia parviflora 29.85 2.2

21

6.2 Figures

Figure 1 Ulex minor found on a field trip in Hainault Forest, coordinates: 51°37'09.5"N, 0°07'57.1"E.

Stomatal images

22

Flow cytometry histogram results a) b)

c) d)

e) f)

23

g) h)

i) j)

k) l)

24

m) n)

Figure 3. Flow histograms (a-n)

a. Flow histogram of field-collected Ulex minor superior to Petroselinum crispum with General Purpose Buffer 3PVPBmet (1a). b. Flow histogram of MSB# 11282 Vicia parviflora inferior to Pisum sativum General Purpose Buffer 3PVPBmet (1b) c. Flow histogram of MSB# 11684 Vicia sativa subsp. nigra inferior to Pisum sativum in General Purpose Buffer 3PVPBmet (1c) d. Flow histogram of MSB# 130408 Glaux maritima inferior to Pisum sativum in General Purpose Buffer 3PVPBmet (1a) e. Flow histogram of MSB# 31549 Trifolium ochroleucon inferior to Petroselinum crispum General Purpose Buffer 3PVPBmet (2a).

f. Flow histogram of MSB# 31804 Trifolium bocconei inferior to Petroselinum crispum General Purpose Buffer 3PBPBmet (1b)

g. Flow histogram of MSB# 550321 Oxytropis campestris superior to Petroselinum crispum General Purpose Buffer 3PVPBmet (1b)

h. Flow histogram of MSB# 55479 Trifolium suffocatum inferior to Petroselinum crispum in General Purpose Buffer 3PVPBmet (1c)

i. Flow histogram of MSB# 177032 Astragalus alpinus inferior to Pisum sativum General Purpose Buffer 3PVPBmet (1b)

j. Flow histogram of MSB# 695253 Astragalus danicus inferior to Pisum sativum in General Purpose Buffer 3PVPBmet (1a)

k. Flow histogram of MSB# 7559 Trifolium ornithopodioides inferior to Petroselinum crispum in General Purpose Buffer 3PVPBmet (1b)

l. Flow histogram of MSB# 75909 Ornithopus pinnatus inferior to Pisum sativum in General Purpose Buffer 3PVPBmet (1b)

m. Flow histogram of MSB# 970923 Gentiana pneumonanthe superior to Petroselinum crispum with General Purpose Buffer 3PVPBmet (1c)

n. Flow histogram of MSB# 52685 Gentiana verna superior to Petroselinum crispum with General Purpose Buffer 3PVPBmet (1c)

25

Figure 4. Pie chart showing the distribution of genome size groups used for the sake of this study, with their percentages.

26

Figure 5. Divergence of genome size in the UK Fabaceae flora

Figure 5. Phylogenetic tree of the UK Fabaceae flora analysed in this study, constructed in R from a pruned Smith and Brown (2018) larger, more inclusive phylogenetic tree. Branch colour gradient reflects the GS, smallest genomes are a dark blue and largest are red. 27

Figure 6. Divergence of genome size in the UK Fabaceae flora (left) and EF values (right)

Figure 6. Left hand side: As in Figure 5, Phylogenetic tree of the UK Fabaceae flora analysed in this study, constructed in R from a pruned Smith and Brown (2018) larger, more inclusive phylogenetic tree. Branch colour gradient reflects the GS; smallest genomes are a dark blue and largest are red. Numbers for the ancestral reconstructed 1C GS values measured in picograms (pg), are visible at most recent common ancestor (MRCA) nodes. Right hand side: EF colour gradient phylogenetic tree; lowest EF values in dark blue, highest EF values in red. Numbers for ancestral reconstructed EF values are visible at MRCA nodes.

28

Figure 7. Divergence of genome size in the UK Fabaceae flora (left) and EN values (right)

Figure 7. Left hand side: As in Figure 5, Phylogenetic tree of the UK Fabaceae flora analysed in this study, constructed in R from a pruned Smith and Brown (2018) larger, more inclusive phylogenetic tree. Branch colour gradient reflects the GS; smallest genomes are a dark blue and largest are red. Numbers for the ancestral reconstructed 1C GS values measured in picograms (pg), are visible at most recent common ancestor (MRCA) nodes. Right hand side: EN colour gradient phylogenetic tree; lowest N values in dark blue, highest EN values in red. Numbers for ancestral reconstructed EN values are visible at MRCA nodes. 29

The relationship between Genome size and Ellenberg F value (a) The relationship between Genome size and Ellenberg N value (b)

Figure 8a) Image showing plot of Ellenberg F (water), plotted against genome size. The x axis represents the Ellenberg F value, the y axis is the genome size. Figure 8b) Image showing plot of Ellenberg N (nitrogen) values, plotted against genome size. The x axis is the Ellenberg N value, the y axis is the Genome size. 30

7. References

Beaulieu, J., Leitch, I., J., Patel, S., Pendharkar, A., Knight, C., A. 2008. Genome size is a strong predictor of cell size and stomatal density in angiosperms. New Phytologist. 179: 975-986.

Beerling, D. J., & Franks, P. J. 2010. The hidden cost of transpiration. Nature Plant science. 464: 495– 496.

Bennett, M. D. 1971. The duration of meiosis. Proceedings of the Royal Society of London. Series B. Biological Sciences, 178:1052, 277-299.

Bennett, M. D., & Smith, J. B. 1976. Nuclear DNA amounts in angiosperms. Philosophical Transactions of the Royal Society of London. B, Biological Sciences, 274:933, 227-274.

Bennett, M. D., Leitch, I. J., & Hanson, L. 1998. DNA amounts in two samples of angiosperm weeds. Annals of Botany. 82, 121-134.

Bennett, M., D., and Leitch, I., J. 2005. Genome Size Evolution in Plants. In: The Evolution of the Genome. Gregory, T., R., (ed.) Elsevier Academic. 89-162.

Bertolino, L. et al. 2019. Impact of Stomatal density and morphology on water-use efficiency in a changing world. Frontiers in Plant Science. 10: 225.

Botanical Society of Britain & Ireland and the Biological Records Centre (BRC). Online Atlas of the British and Irish Flora.

Dodsworth, S., Leitch, A., Leitch, I. 2015. Genome size diversity in angiosperms and its influence on gene space. Current Opinion in Genetics & Development. Elsevier. 35: 73-78.

Doyle, J., J. 2001. Leguminosae. Encyclopaedia of Genetics. 1081-1085.

Drake, P. L., Froend, R. H., & Franks, P. J. 2013. Smaller, faster stomata: scaling of stomatal size, rate of response, and stomatal conductance. Journal of experimental botany. 64(2): 495-505.

Ellenberg, H., Weber, H.E., Düll, R., Wirth, W., Werner, W. and Paulissen, D. 1992. Indicator values of plants in Central Europe. Scripta Geobotanica.18, 1-258.

Essex Ecology Services (EECOS). 2010. Epping Forest District Local Wildlife Sites Review. http://www.efdclocalplan.org/wp-content/uploads/2017/12/LoWS-EB708-A2-sites-81-150.pdf Franks, P. J., & Farquhar, G. D. 2007. The mechanical diversity of stomata and its significance in gas- exchange control. Plant Physiology, 143(1): 78–87.

Franks, P. J., & Beerling, D. J. 2009. Maximum leaf conductance driven by CO2 effects on stomatal size and density over geologic time. Proceedings of the National Academy of Sciences. 106(25): 10343–10347.

Franks, P., J., Beerling, D., J. 2009. CO2-forced evolution of plant gas exchange capacity and water- use efficiency over the Phanerozoic. Geobiology. 7:225-236.

31

Freckleton, R., P., Harvey, P., H., Pagel, M. 2002. Phylogenetic Analysis and Comparative Data: A Test and Review of Evidence. The American Naturalist. 160:6.

Greilhuber, J., and Leitch, I., J. 2013. Genome Size and the Phenotype. In: Greilhuber J., Dolezel J., Wendel J. (eds) Plant Genome Diversity Volume 2. Springer, Vienna: Springer. 323-344.

Greilhuber J., Dolezel, J., Lysak, M., Bennett, M. 2005. The origin, evolution and proposed stabilization of the terms ‘genome size’ and ‘c-value’ to describe nuclear DNA contents. Annals of Botany. 95:1. 255-260.

Guignard, M., Nichols, R., Knell, R., Macdonald, A., Romila, C-A., Trimmer, M., Leitch, I., Leitch, A. 2016. Genome Size and ploidy influence species’ biomass under nitrogen and phosphorus limitation. New Phytologist. 210: 1195–1206.

Guignard, M. S., Leitch, A. R., Acquisti, C., Eizaguirre, C., Elser, J. J., Hessen, D. O. Soltis, D. E. 2017. Impacts of nitrogen and phosphorus: from genomes to natural ecosystems and agriculture. Frontiers in Ecology and Evolution. 5, 70.

Hawkins, J. Proulx, S. R., Rapp, A., R., Wendel, J. F. 2009. Rapid DNA loss as a counterbalance to genome expansion through retrotransposon proliferation in plants. PNAS. 106: 42. 17811 - 17816.

Hill, M.O., Mountford, J.O., Roy, D.B. and Bunce, R.G.H. 1999. Ellenberg's indicator values for British plants. ECOFACT Volume 2 Technical Annex (Vol. 2). Institute of Terrestrial Ecology.

IPPC. 2014. Climate Change 2014 Synthesis Report Summary for Policymakers.

Jarvis, S., G., Edwin C. Rowe, E., C., Henrys, P., A., Smart, S, M., Jones, L., Garbutt, A. 2016. Empirical realised niche models for British coastal plant species. Journal of Coastal Conservation. 20, 2:107– 116. Kew, R. B. G., Gardens, V., & Wakehurst, V. 2012. Plant DNA C-values database.

Knight, C. A., & Ackerly, D. D. 2002. Variation in nuclear DNA content across environmental gradients: a quantile regression analysis. Ecology Letters, 5,1: 66-76.

Knight, C. A., Molinari, N. A., & Petrov, D. A. 2005. The large genome constraint hypothesis: evolution, ecology and phenotype. Annals of Botany, 95(1): 177-190.

Leitch, I., J. & Leitch, A. 2013. ‘Genome Size Diversity and Evolution in Land Plants.’ Greilhuber, J., Dolezel, J. In: Plant Genome Diversity Volume 2. Royal Botanic Gardens, Kew, 307-340. Lígia T. Bertolino, L., T., Caine, R., S., Gray, J., E. 2019. Impact of Stomatal Density and Morphology on Water-Use Efficiency in a Changing World. Frontiers in Plant Science.

Masterson, J. (1994). Stomatal size in fossil plants: evidence for polyploidy in majority of angiosperms. Science, 264(5157), 421-424.

Münkemüller, T., Lavergne, S., Bzeznik, B., Dray, S., Jombart, T., Schiffers, K., Thuiller, W. 2012. How to measure and test phylogenetic signal. Methods in Ecology and Evolution. 3. 743–756.

32

Orme, D., Freckleton, R., Thomas, G., Petzoldt, T., Fritz, S., Issac, N., Pearse, W. 2018. Caper: comparative Analyses of Phylogenetics and Evolution in R. R package version 1.0.1. https://CRAN. R- project.org/project=caper

Paradis E. & Schliep K. 2018. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35: 526-528.

Pellicer, J., Kelly, L., J., Leitch, I., J., Zomlefer, W., B., Fay, M., F. 2014. A universe of dwarfs and giants: genome size and chromosome evolution in the monocot family Melanthiaceae. New Phytologist. 201: 1484-1497.

Peter, J., Young, W., Haukka, K., E. 1996. Diversity and phylogeny of rhizobia. New Phytologist Trust. 133:1.

Pinheiro J., Bates, D., DebRoy, S., Sarkar, D., R Core Team. 2019. Nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-141, url:https: //CRAN. R-project.org/package=nlme.

RBG Kew MSB Seed List. http://apps.kew.org/seedlist/SeedlistServlet

R Core Team. 2019. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

Revell, L., J., Harman, L., J., Collar, D. 2008. Phylogenetic Signal, Evolutionary Process, and Rate. Systematic Biology. 57:4. 591–601.

Revell, L. J. 2012. phytools: An R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3 217-223.

Rose, F., 2006. The Wildflower Key – How to identify wild plants, trees and shrubs in Britain and Ireland. Penguin books Ltd. 266-276.

Rupp, B., et al. 2010. Genome Size in Polystachya (Orchidaceae) and its relationships to epidermal characters. Botanical Journal of the Linnean Society. 163: 223-233.

RStudio Team. 2015. RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/.

Schaffers, A. and Sykora K. V. 2000. Reliability of Ellenberg indicator values for moisture, nitrogen and soil reaction: a comparison with field measurements. Journal of Vegetation Science. 11: 225-244.

Schliep K.P. 2011. phangorn: phylogenetic analysis in R. Bioinformatics, 27(4) 592-593.

Schliep, K., Potts, A. J., Morrison, D. A., Grimm, G. W. 2017. Intertwining phylogenetic trees and networks. Methods in Ecology and Evolution, 8: 1212--1220. doi: 10.1111/2041- 210X.12760

Scott Chamberlain and Eduard Szocs. 2013. taxize - taxonomic search and retrieval in R. F1000Research, 2:191.

Scott Chamberlain, Eduard Szoecs, Zachary Foster, Zebulun Arendsee, Carl Boettiger, Karthik Ram, Ignasi Bartomeus, John Baumgartner, James O'Donnell, Jari Oksanen, Bastian Greshake Tzovaras, 33

Smith, S., A., and Brown, J., W. 2018. Constructing a broadly inclusive seed plant phylogeny. American Journal of Botany. 105:3.

Stougaard, J. 2001. Genetics and genomics of root symbiosis. Current Opinion in Plant Biology. 4: 328-335.

Sysmex, n.d. Flow Cytometer Theory Sysmex-Partec GmbH (SPG) R&I FCM Sales & Marketing.

Symonds, M., Blomberg, S., P. 2014. A Primer on Phylogenetic Generalised Least Squares. Book: Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology: Concepts and Practice. Chapter Number: A Primer on Phylogenetic Generalised Least Squares. Springer.

Philippe Marchand, Vinh Tran, Maëlle Salmon, Gaopeng Li, and Matthias Grenié. 2019. taxize: Taxonomic information from around the web. R package version 0.9.7.

Soltis, D., E., Soltis, P., S., Bennett, M., D., Leitch, I.,J. 2003. Evolution of genome size in the angiosperms. American Journal of Botany. 90 (11): 1596-1603.

Tedersoo, L., Laanisto, L., Rahimlou, S., Toussaint, A., Halikma, T., Partel, M. 2018. Global database of plants with root-symbiotic nitrogen fixation: NodDB. Journal of Vegetation Science. 1-9.

The Legume Phylogeny Working Group (LPWG). 2017. A new subfamily classification of the Leguminosae base on a taxonomically comprehensive phylogeny. TAXON. 66(1): 44-77.

Vinogradov, A, E. 2003. Selfish DNA is maladaptive: evidence from the plant Red List. Trends in Genetics. 19(11): 609-614.

Veselý , P., Bureš, P., Šmarda, P., & Pavlíček, T. (2011). Genome size and DNA base composition of geophytes: the mirror of phenology and ecology?. Annals of Botany, 109(1), 65-75.

Vessey, J. K. (2003). Plant growth promoting rhizobacteria as biofertilizers. Plant and soil, 255(2), 571-586.

Wagner, S. C. 2011. Biological Nitrogen Fixation. Nature Education Knowledge 3(10):15.

Wanstead Wildlife. 2014. List of Plants found in the City of London Cemetery. https://www.wansteadwildlife.org.uk/index.php/en/component/tags/tag/plants

Wendel, J. F. 2015. The wondrous cycles of polyploidy. American Journal of Botany. 102(11): 1753- 17.

Young, J.P.W. 1996. Phylogeny and taxonomy of rhizobia. Plant and Soil. 186: 45-52.

34

8. Appendices and supplementary material Statistical tests on the data in R

R Code with comments marked by a ‘#’ and R output in italics

# Import data and assign it to a variable name "mydata". mydata <- read.csv(file.choose(), header = TRUE)

#Check data has imported correctly. str(mydata)

#Plot a scatter graph of the data to visualise any relationship between the two variables in the data. Plot F category on the X axis and Genome size on the Y axis. plot(mydata$Ellenberg_F, mydata$Genome_size) # (see Figure 8a)

# Initial observation: Most data points have the Ellenberg F value of 4, the greatest range of genome sizes is also in this category. Ellenberg 5 also has a more data points and a large range of genome size values. Ellenberg 3 and 6 have fewer data points and genome size looks generally smaller. 7 and 8 only have a few data points between them, which have a small genome size. More water seems to correlate with smaller genomes, whereas a mid-range Ellenberg F value correlate with large range - including larger genomes. Generally looks like a slight negative correlation.

# Repeat for Ellenberg N plot(mydata$Ellenberg_N, mydata$Genome_size) # (see Figure 8b)

# Initial observation: Largest genomes and largest range is also in the mid-range of Ellenberg's 4-5 Ellenberg N. Slight positive correlation, as N increases, GS increases.

# The Pearson’s product-moment correlation coefficient is a measure of the strength of the linear relationship between two variables. # Set up an object to run a correlation test to see if there is a correlation. The X value is the F category and the Y value is the genome size measurement, as GS is a continuous variable. FabCorTest <- cor.test(mydata$Ellenberg_F,mydata$Genome_size)

#Return the output values form the Pearson’s Product Moment Correlation test FabCorTest

# The estimation of the correlation coefficient is 0.095. a perfect positive linear relationship would be 1 and a perfect negative linear relationship would be -1.

# 0.09 is close to zero. Zero indicates that there is no linear relationship. # As F value decreases the range in genome sizes increases, therefore there are some plants with larger genomes and some with smaller at lower/mid-range Ellenberg numbers. # The prior predictions were that genome size would be smaller for lower Ellenberg F values, however in this data that is not clearly demonstrated. There is a very slight, weak negative relationship (increasing genome size with decreasing Ellenberg F). # The t value of the significance test is 0.7 # The p-value is 0.5, indicating no statistically significant correlation between the variables.

R output Pearson's product-moment correlation 35

data: mydata$Ellenberg_F and mydata$Genome_size t = 0.70143, df = 54, p-value = 0.486 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: -0.1721807 0.3491983 sample estimates: cor 0.0950211

FabCorTest2 <- cor.test(mydata$Ellenberg_N, mydata$Genome_size) FabCorTest2

R output Pearson's product-moment correlation data: mydata$Ellenberg_N and mydata$Genome_size t = 2.1143, df = 54, p-value = 0.03913 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.01466314 0.50284575 sample estimates: cor 0.2764978

# The correlation between N and GS, is 0.3. This is slightly closer to 1 than the correlation for F but is still pretty close to zero, indicating a weak positive relationship. # Fitting a linear model with the response variable on the left, a tilde (~) and then the explanatory variable. In this case, the response variable is Genome_size which can be explained by Ellenberg_F, the explanatory variable. MyMod <- lm(mydata$Genome_size ~ mydata$Ellenberg_F)

# To obtain summary statistics summary(MyMod)

R output Call: lm(formula = mydata$Genome_size ~ mydata$Ellenberg_F) Residuals: Min 1Q Median 3Q Max -2.642 -1.667 -1.321 1.196 8.222 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.5233 1.2414 1.227 0.225 mydata$Ellenberg_F 0.1886 0.2688 0.701 0.486 Residual standard error: 2.373 on 54 degrees of freedom Multiple R-squared: 0.009029, Adjusted R-squared: -0.009322 F-statistic: 0.492 on 1 and 54 DF, p-value: 0.486

# The estimated intercept is 1.5. The slope that relates genome size to Ellenberg F value is 0.18. The p value is 0.5, not significant. # Diagnostic plots to check that the data conform to the assumptions of a linear model; normally distributed errors. This means that distances between expected and actual data points follow a normal distribution. Difference between predicted and observed are called residuals. There is also an 36

assumption that the range is the same across the response variable. If the variance increases with the mean you can transform your data using a sqrt() or log transformation to reduce increases in variance OR use a generalised linear model with poisson errors. Linear regression fits a straight line but if the relationship is curved, another model would be more appropriate.

#Check how well the data conform to these assumptions plot(MyMod)

R’s Quartile-Quartile plot output

# The data is positively skewed, see the Q-Q plot. Log-transform the data by using the log() function.

TransGenome_size2 <- summary(log10(mydata$Genome_size)) TransEllenberg_F2 <- summary(log10(mydata$Ellenberg_F)) TransEllenberg_N2 <- summary(log10(mydata$Ellenberg_N))

TransMyMod2 <- lm(TransGenome_size2 ~ TransEllenberg_F2) summary(TransMyMod2) # significant. p = 0.002 R output Call: lm(formula = TransGenome_size2 ~ TransEllenberg_F2) Residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. -0.03355 -0.21522 0.07142 0.05232 0.21339 -0.08836 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.2500 0.3451 -6.520 0.00286 ** TransEllenberg_F2 3.7201 0.5183 7.178 0.00200 ** Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1648 on 4 degrees of freedom Multiple R-squared: 0.928, Adjusted R-squared: 0.9099 F-statistic: 51.52 on 1 and 4 DF, p-value: 0.001995

# Repeat for the effect of Ellenberg N on GS TransMyMod3 <- lm(TransGenome_size2 ~ TransEllenberg_N2) summary(TransMyMod3) # significant, p = 0.003.

R output Call: lm(formula = TransGenome_size2 ~ TransEllenberg_N2) Residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. 0.16569 -0.11009 -0.15038 -0.09626 -0.05978 0.25082 37

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.6743 0.1552 -4.344 0.01221 * TransEllenberg_N2 1.8566 0.2946 6.303 0.00324 ** Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1857 on 4 degrees of freedom Multiple R-squared: 0.9085, Adjusted R-squared: 0.8856 F-statistic: 39.72 on 1 and 4 DF, p-value: 0.003239

# The r-squared is high, thus we can infer that the model fits the data well. R-squared is the percentage of the response variable that is explained by the linear model. #So far we have just looked to see if water alone or nitrogen alone have a significant effect on genome size. we believe that there could be an interaction. The combination of high water and high nutrients may encourage larger genomes as the plant can only use the high nutrients when water is also high.

# Run a linear model on log transformed data, with an interaction term. Interaction1 <- lm(TransGenome_size ~ TransEllenberg_F * TransEllenberg_N) summary (Interaction1)

R output Call: lm(formula = TransGenome_size ~ TransEllenberg_F * TransEllenberg_N) Residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. 0.003632 -0.006909 -0.025635 0.032065 -0.001320 -0.001833 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.5448 0.5856 2.638 0.1186 TransEllenberg_F -2.7242 0.9842 -2.768 0.1095 TransEllenberg_N -2.4021 0.5661 -4.243 0.0513 . TransEllenberg_F:TransEllenberg_N 5.1428 0.9912 5.188 0.0352 * Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.02959 on 2 degrees of freedom Multiple R-squared: 0.9971, Adjusted R-squared: 0.9927 F-statistic: 229.2 on 3 and 2 DF, p-value: 0.004347

# here the interaction term is significant. p <0.005. # Repeat interaction lm() with explanatory variables the other way around

Interaction2 <- lm(TransGenome_size ~ TransEllenberg_N * TransEllenberg_F) summary (Interaction2)

R output Call: lm(formula = TransGenome_size ~ TransEllenberg_N * TransEllenberg_F) Residuals: Min. 1st Qu. Median Mean 3rd Qu. Max. 0.003632 -0.006909 -0.025635 0.032065 -0.001320 -0.001833 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.5448 0.5856 2.638 0.1186 TransEllenberg_N -2.4021 0.5661 -4.243 0.0513 . TransEllenberg_F -2.7242 0.9842 -2.768 0.1095 38

TransEllenberg_N:TransEllenberg_F 5.1428 0.9912 5.188 0.0352 * Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.02959 on 2 degrees of freedom Multiple R-squared: 0.9971, Adjusted R-squared: 0.9927 F-statistic: 229.2 on 3 and 2 DF, p-value: 0.004347

#The interaction term is significant p < 0.005.

Phylogenetic tree generation in R

# install and load the required packages library (ape) library (phytools) library (taxize) setwd("C:/Users/...... ") # set working directory Fab.tre <- read.tree("FabTreeUK.tre") #read tree Fab.tre plot(Fab.tre)

#continued editing of tree Fab.tre.lad <- ladderize(Fab.tre, right=FALSE) ###ladderise tree chartree <- Fab.tre.lad chartree$edge.length <- NULL plot(chartree) plot(Fab.tre.lad, type = "phylogram", use.edge.length = FALSE)

#match phylogenetic tree and measurements values <- read.csv("THE FAB DATABASE.csv", row.names=1)

#trim tree to match data points #match taxon labels for GS values to phylogenetic tree GStaxa <- rownames(values) treetaxa <- Fab.tre$tip.label

#match taxon labels to identify taxa that need to be trimmed matched <- treetaxa[treetaxa %in% GStaxa] #which taxon labels in the GS data match the tree matched #look at the taxon labels that match trimmedtaxa <- treetaxa[!(treetaxa %in% GStaxa)] #taxon labels from tree that don't match GS data (!) i.e. no GS data available trimmedtaxa #list taxa to trim from tree to match GS data

#trim the tree down Fab.tre.trim<-drop.tip(Fab.tre, c("Trifolium_occidentale", "Trifolium_squamosum", "Trifolium_strictum", "Trifolium_micranthum", "Ononis_repens", "Ononis_reclinata", "Vicia_sativa_subsp._nigra", "Lathyrus_palustris", 39

"Lathyrus_linifolius", "Ornithopus_perpusillus", "Anthyllis_vulneraria_subsp._lapponica", "Genista_anglica" ))

#reverse match matched2<-GStaxa[(GStaxa %in% treetaxa)] matched2 trimmedtaxa2 <- GStaxa[!(GStaxa %in% treetaxa)] trimmedtaxa2 values2 <- values[setdiff(rownames(values),trimmedtaxa2),] values2 GScvalues <-setNames(values2$Genome_size, row.names(values2)) GScvalues EFvalues <- setNames(values2$Ellenberg_F, row.names(values2)) ENvalues <- setNames(values2$Ellenberg_N, row.names(values2))

#Ancestral state reconstruction reconGS <- fastAnc(Fab.tre.trim, GScvalues) #Maximum Likelihood reconstruction of trait onto phylogeny reconGS round.reconGS <- round(reconGS, digits=2) reconF <- fastAnc(Fab.tre.trim, EFvalues) reconF round.reconF <- round(reconF, digits=2) obj<-contMap(Fab.tre.trim, GScvalues, plot=FALSE) ###creates continuous map before plotting data par(cex=0.6) obj<-setMap(obj,invert=TRUE) plot(obj, lwd = 2, res = 1000, fsize = c(1,1), type = "phylogram", outline = FALSE, legend = 0.5*max(nodeHeights(Fab.tre))) #plots the phylogenetic tree with layout options eg. text size etc. #invert = true is to flip colour gradient>blue to red rather obj2 <- obj class(obj2) <- "phylo" plotTree(obj2, lwd=1.3, res=1000, fsize=c(0.3,0.3), use.edge.length = FALSE, outline= FALSE, legend=0.5*max(nodeHeights(Fab.tre))) #plots the phylogeny with layout options eg. text size etc. invert=true is to flip colour gradient>blue to red rather class(obj) obj

#Two phylogenetic trees facing each other, showing genome size against Ellenberg F value p1 <- contMap(Fab.tre.trim, GScvalues, plot=FALSE) #creates continuous map before plotting data p1 <- setMap(p1,invert=TRUE) p2 <- contMap(Fab.tre.trim, EFvalues, plot=FALSE) #creates continuous map before plotting data p2 <- setMap(p2, invert=TRUE) labels <- as.vector(Fab.tre.trim$tip.label) require(stringr) labels_ <- str_replace_all(labels, "_", " ") # remove underscore from species names layout(matrix(1:3,1,3),widths=c(0.44,0.12,0.44)) par(cex=.55) plot(p1, lwd=2, res=1000, ftype="off", type = "phylogram", outline= FALSE, legend=0.5*max(nodeHeights(Fab.tre))) 40

nodelabels(round.reconGS, cex = 1, frame="none") #numbers the nodes related to recon values ylim<-c(1-0.12*(length(Fab.tre.trim$tip.label)-1),length(Fab.tre.trim$tip.label)) plot.new(); plot.window(xlim=c(-0.1,0.1),ylim=ylim) text(rep(0,length(Fab.tre.trim$tip.label)), 1:length(Fab.tre.trim$tip.label),labels_) plot(p2, lwd = 2, res = 1000, ftype = "off", type = "phylogram", direction ="leftwards", outline = FALSE, legend = 0.5*max(nodeHeights(Fab.tre))) nodelabels(round.reconF, cex = 1, frame="none") #numbers the nodes related to recon values

# Version for Ellenberg N against GS. p2 <- contMap(Fab.tre.trim, ENvalues, plot=FALSE) #creates continuous map before plotting data p2 <- setMap(p2, invert=TRUE) # This makes red the biggest value and blue the smallest labels <- as.vector(Fab.tre.trim$tip.label) require(stringr) labels_ <- str_replace_all(labels, "_", " ") # remove underscore from spp. names layout(matrix(1:3,1,3),widths=c(0.44,0.12,0.44)) par(cex=.55) plot(p1, lwd=2, res=1000, ftype="off", type = "phylogram", outline= FALSE, legend=0.5*max(nodeHeights(Fab.tre))) # this plots the left hnd side/ GS side nodelabels(round.reconGS, cex = 1, frame="none")###numbers the nodes related to recon values ylim <- c(1-0.12*(length(Fab.tre.trim$tip.label)-1),length(Fab.tre.trim$tip.label)) # this plots the spp. names plot.new(); plot.window(xlim=c(-0.1,0.1),ylim=ylim) text(rep(0,length(Fab.tre.trim$tip.label)), 1:length(Fab.tre.trim$tip.label),labels_) plot(p2, lwd=2, res=1000, ftype="off", type = "phylogram", direction="leftwards", outline= FALSE, legend=0.5*max(nodeHeights(Fab.tre))) # this plots the right hand phylo with N values nodelabels(round.reconF, cex = 1, frame="none")###numbers the nodes related to recon values

Stomatal measurements code in R

# import data gcldata <- read.csv(file.choose()) View(gcldata) CorTest <- cor.test(gcldata$Genome_size_1C.pg.,gcldata$Average_stomatal_guard_cell_length..m.) CorTest #1 is a perfect positive relationship. 0.98 is very close to 1, thus it is a positive relationship, p < 0.05.

CSR code in R

CSR_data <- file.choose() #import data CSR_data <- read.csv(file.choose(), header = TRUE) View(CSR_data) plot(CSR_data$Genome_size, CSR_data$C) # scatter plot of GS against Competitor CSR value plot(CSR_data$Genome_size, CSR_data$S) # scatter plot of GS against Stress-tolerant CSR value plot(CSR_data$Genome_size, CSR_data$R) # scatter plot of GS against Ruderal CSR value # No clear pattern showing in any of these plots.

# Linear model on the effect of Stress-tolerant CSR value on GS Slm <- lm (CSR_data$Genome_size ~ CSR_data$S) summary(Slm) 41

R output Call: lm(formula = CSR_data$Genome_size ~ CSR_data$S) Residuals: Min 1Q Median 3Q Max -1.602 -1.352 -1.001 1.071 6.141 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.851662 0.724374 2.556 0.0159 * CSR_data$S 0.002389 0.017481 0.137 0.8922 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.972 on 30 degrees of freedom Multiple R-squared: 0.0006221, Adjusted R-squared: -0.03269 F-statistic: 0.01868 on 1 and 30 DF, p-value: 0.8922

# Linear model on the effect of Competitor CSR value on GS Clm <- lm (CSR_data$Genome_size ~ CSR_data$C) summary(Clm)

R output: Call: lm(formula = CSR_data$Genome_size ~ CSR_data$C) Residuals: Min 1Q Median 3Q Max -2.3735 -1.3692 -0.8028 0.8442 5.8072 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.48278 0.59800 2.480 0.019 * CSR_data$C 0.02695 0.02894 0.931 0.359 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.945 on 30 degrees of freedom Multiple R-squared: 0.02809, Adjusted R-squared: -0.004304 F-statistic: 0.8671 on 1 and 30 DF, p-value: 0.3592 plot(Clm) #Observation: the highest C values come from smaller genome plants.

# Linear model on the effect of Ruderal CSR value on GS Rlm <- lm (CSR_data$Genome_size ~ CSR_data$R) summary(Rlm)

R output: Call: lm(formula = CSR_data$Genome_size ~ CSR_data$R) Residuals: Min 1Q Median 3Q Max -1.9043 -1.3341 -0.7185 0.9208 6.1378 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.94246 1.13321 2.597 0.0144 * CSR_data$R -0.02146 0.02308 -0.930 0.3599 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.945 on 30 degrees of freedom 42

Multiple R-squared: 0.02801, Adjusted R-squared: -0.004388 F-statistic: 0.8646 on 1 and 30 DF, p-value: 0.3599 plot(Rlm)

# Observation: there is possibly a weak negative correlation here, although highest Ruderal value plant has a medium genome size. logS <- log(CSR_data$S) #lm on log transformed data #Observation: the smaller genome plants generally have larger R value, this could mean that plants with smaller genomes are more likely to be weeds/ ruderals.

Phylogenetic Generalised Least Squares model

# Fit a linear model using the technique of phylogenetic generalized least squares (PGLS) - first with the packages nlme & ape, and the subsequently using a package called ‘caper’. # Install and load required packages to read the phylogeny and run the analyses. library(ape) library(nlme) library(geiger) library(phytools)

# Load the dataset into R datos <- read.csv("THE FAB DATABASE.csv", header = TRUE, row.names=1) datos

# Load the phylogenetic tree into R. Fab.tre<-read.tree("FabTreeUK.tre")

# Convert form a ‘.tre’ file to a nexus file. writeNexus(Fab.tre, file="FabTreeUK.nex") arbol<-read.nexus("FabTreeUK.nex")

#Use the geiger function name.check to ensure that all the species in the data are in the tree and vice versa. Obj <- name.check(arbol,datos) obj arbol.cortado <- drop.tip(arbol,obj$tree_not_data) 43

name.check(arbol.cortado,datos) row.names.remove <- obj$data_not_tree datos.cortados <- datos[!(row.names(datos) %in% row.names.remove), ] name.check(arbol.cortado, datos.cortados) #R returns "OK" if the data in the dataset and the phylogenetic tree match

#Add log transformed columns for Ellenbergs and GS log_E_N <- log(datos.cortados$Ellenberg_N) log_E_F <- log(datos.cortados$Ellenberg_F) log_GS <- log(datos.cortados$Genome_size) datos.cortados["log_GS"] <-NA datos.cortados$log_GS <-log_GS datos.cortados["log_E_F"] <-NA datos.cortados$log_E_F <-log_E_F datos.cortados["log_E_N"] <-NA datos.cortados$log_E_N <-log_E_N

# Now the tree and dataset match, start to explore the phylogenetic GLS. # The package used to conduct PGLS is ‘caper’. # Phylogenetic GLS is a linear model fitting method in which the covariance (correlation) structure between species is permitted to match that expected under a Brownian motion process* of evolution on the tree. (*Or other processes.) Consequently, the first step is to define this covariance structure, as follows: bm<-corBrownian(1,arbol.cortado) bm

# Uninitialized correlation structure of class corBrownian # A variance-covariance structure is now defined based on the model of Brownian evolution. # The first model fitted is for a simple analysis investigating the relationship between GS and Ellenberg F. modelo1 <- gls(Genome_size ~ Ellenberg_F, data = datos.cortados, correlation =bm) plot(Genome_size ~ Ellenberg_F, data = datos.cortados, cex=1.5, pch=21, bg="grey") abline(modelo1, lwd=2, col = "darkgrey",lty = "dashed")

# Repeat on log transformed data modelo1<-gls(log_GS~log_E_F, data = datos.cortados, correlation = bm) plot(log_GS~log_E_F, data = datos.cortados, cex=1.5, pch = 21,bg = "grey") abline(modelo1,lwd=2,col="darkgrey",lty="dashed") summary(modelo1) residuals(modelo1)

# Install and load the required ‘caper’ package library(caper)

Species <- rownames(datos.cortados) rownames(datos.cortados) <- NULL data <- cbind(Species,datos.cortados) comp.data<-comparative.data(arbol.cortado, data, names.col="Species",vcv.dim=2,warn.dropped=TRUE) 44

modelo2 <- pgls(Genome_size~Ellenberg_F,data=comp.data) summary(modelo2) lm.lk <- pgls.profile(modelo2,which="lambda") # the Lambda is 1, which is represents a strong phylogenetic signal in all summaries of the data

# PGLS model on the effects of Ellenberg N on GS modelo3 <- pgls(Genome_size ~ Ellenberg_N, data=comp.data) summary(modelo3) # Prints the statistical model results

# Run of PGLS for the main effects and mixed effects or interaction of Ellenberg N and Ellenberg F on # GS modelo4 <- pgls(Genome_size~Ellenberg_N * Ellenberg_F,data = comp.data) summary(modelo4) # Returns the statistics for the model 45

Flow cytometry histograms a) b)

c) d)

e) f)

46

g) h)

i) j)

k) l)

47

m) n)

o) p)

q) r)

s)

48

s) t)

u) v)

w) x)

49

a) Flow histogram of field-collected Ulex minor superior to Petroselinum crispum with General Purpose Buffer 3PVPBmet (01b) b) Field-collected Ulex minor superior to Petroselinum crispum in General Purpose Buffer3PVPBmet (1b) c) Flow histogram of MSB# 11282 Vicia parviflora inferior to Pisum sativum with General Purpose Buffer 3PVPBmet (1a) d) Flow histogram of MSB# 11282 Vicia parviflora inferior to Pisum sativum with General Purpose Buffer 3PVPBmet (1c) e) Flow histogram of MSB# 11684 Vicia sativa subsp. nigra inferior to Pisum sativum in General Purpose Buffer 3PVPBmet (1a) f) Flow histogram of MSB# 11684 Vicia sativa subsp. nigra inferior to Pisum sativum General Purpose Buffer 3PVPBmet (1b) g) Flow histogram of MSB# 130408 Glaux maritima inferior to Pisum sativum in General Purpose Buffer 3PVPBmet (1b) h) Flow histogram of MSB# 31549 Trifolium ochroleucon inferior to Petroselinum crispum in Ebihara Bmet buffer (1a) i) Flow histogram of MSB# 31549 Trifolium ochroleucon inferior to Petroselinum crispum General Purpose Buffer 3PVPBmet (1b) j) Flow histogram of MSB# 31549 Trifolium ochroleucon inferior to Petroseliunum crispum General Purpose Buffer 3PVPBmet (1c) k) Flow histogram of MSB# 31549 Trifolium ochroleucon inferior to Petroselinum crispum General Purpose Buffer 3PVPBmet (2b) l) Flow histogram of MSB# 31804 Trifolium bocconei inferior to Petroselinum crispum General Purpose Buffer 3PVPBmet buffer (1a) m) Flow histogram of MSB# 31804 Trifolium bocconei inferior to Petroselinum crispum General Purpose Buffer 3PBPBmet buffer (2a) n) Flow histogram of MSB# 31804 Trifolium bocconei inferior to Petroselinum crispum General Purpose Buffer 3PBPBmet buffer (2b) o) Flow histogram of MSB# 31804 Trifolium bocconei inferior to Petroselinum crispum with General Purpose Buffer 3PBPBmet (2c) p) Flow histogram of MSB# 55479 Trifolium suffocatum inferior to Petroselinum crispum General Purpose Buffer 3PVPBmet (1a) q) Flow histogram of MSB# 55479 Trifolium suffocatum inferior to Petroselinum crispum in General Purpose Buffer 3PVPBmet (1b) r) Flow histogram of MSB# 177032 Astragalus alpinus inferior to Pisum sativum with General Purpose Buffer 3PVPBmet (1a) s) Flow histogram of MSB# 695253 Astragalus danicus inferior to Pisum sativum with General Purpose Buffer 3PVPBmet (1c) t) Flow histogram of MSB# 7559 Trifolium ornithopodioides inferior to Petroselinum crispum in General Purpose Buffer 3PVPBmet (1a) u) Flow histogram of MSB# 7559 Trifolium ornithopodioides inferior to Petroselinum crispum in General Purpose Buffer 3PVPBmet (1c) v) Flow histogram of MSB# 75909 Ornithopus pinnatus inferior to Pisum sativum with Ebihara Bmet buffer (1a) w) Flow histogram of MSB# 75909 Ornithopus pinnatus inferior to Pisum sativum in General Purpose Buffer 3PVPBmet (1c) x) Flow histogram of MSB# 970923 Gentiana pneumonanthe superior to Petroselinum crispum with General Purpose Buffer 3PVPBmet (1b)

50

Measurements of stomata guard cells in UK Fabaceae spp., taken at 20x magnification

Accession 31804 Trifolium bocconei Slide 1

Stomata number Guard cell length (μm) 1 14.02 2 10.85 3 13.98 4 17.55 5 11.70 6 14.09 7 11.55 8 13.06 9 11.78 10 13.28 Average 13.19

Accession 31804 Trifolium bocconei Slide 2

Stomata number Guard cell length (μm) 1 11.94 2 10.33 3 12.97 4 11.99 5 9.47 6 12.44 7 11.04 8 12.48 9 12.07 10 15.35 Average 12.01

Accession 31804 Trifolium bocconei Slide 3

Stomata number Guard cell length (μm) 1 13.52 2 14.71 3 13.79 4 15.51 5 12.07 6 13.22 7 14.37 8 15.68 9 14.90 10 15.52 Average 14.33

51

Accession 7559 Trifolium ornithopodioides Slide 1 Guard cell length (μm)

Stomata number 1 16.46 2 15.70 3 12.33 4 14.33 5 16.00 6 15.90 7 15.68 8 13.41 9 17.01 10 14.84 Average 15.07

Accession 7559 Trifolium ornithopodioides Slide 2

Stomata number Guard cell length (μm) 1 15.56 2 17.09 3 14.24 4 14.84 5 15.68 6 17.88 7 15.03 8 16.50 9 16.74 10 18.19 Average 16.17

Accession 7559 Trifolium ornithopodioides Slide 3

Stomata number Guard cell length (μm) 1 17.40 2 14.24 3 19.77 4 14.91 5 14.00 6 14.42 7 17.34 8 19.77 9 17.49 10 16.00 Average 16.53

52

Accession 11282 Vicia parviflora Slide 1 Guard cell length (μm)

Stomata number 1 28.19 2 26.53 3 30.75 4 29.98 5 27.44 6 30.02 7 34.34 8 29.88 9 28.27 10 22.79 Average 28.82

Accession 11282 Vicia parviflora Slide 2

Stomata number Guard cell length (μm) 1 31.50 2 32.01 3 30.03 4 33.18 5 28.41 6 28.89 7 32.08 8 32.67 9 24.38 10 27.68 Average 30.08

Accession 11282 Vicia parviflora Slide 3

Stomata number Guard cell length (μm) 1 35.29 2 28.27 3 24.81 4 32.15 5 35.24 6 32.31 7 30.59 8 26.17 9 27.72 10 29.44 Average 30.20

53

Accession 11282 Vicia parviflora Slide 4 Guard cell length (μm)

Stomata number 1 28.9 2 25.48 3 33.77 4 29.66 5 30.39 6 28.99 7 35.74 8 30.81 9 29.83 10 29.25 Average 30.28

Accession 31549 Trifolium ochroleuron Slide 1

Stomata number Guard cell length (μm) 1 12.12 2 11.94 3 11.04 4 12.34 5 11.70 6 11.52 7 16.08 8 14.07 9 11.60 10 14.07 Average 12.65

Accession 31549 Trifolium ochroleuron Slide 2

Stomata number Guard cell length (μm) 1 18.56 2 13.63 3 11.03 4 15.15 5 15.15 6 11.91 7 14.73 8 17.80 9 13.83 10 16.07 Average 14.77

54

Accession 31549 Trifolium ochroleuron Slide 3 Guard cell length (μm)

Stomata number 1 18.40 2 13.06 3 13.79 4 14.79 5 17.30 6 13.38 7 12.19 8 14.25 9 13.63 10 15.14 Average 14.60

The UK Fabaceae flora database used in this study

Species full name 1C Genome size (pg) Ellenberg F value Ellenberg N value Astragalus alpinus 1.99 4 2 Astragalus danicus 0.64 3 2 Anthyllis vulneraria 0.5 4 2 Astragalus glycyphyllos 0.83 4 3 Cytisus scoparius 0.85 5 4 Genista pilosa 1.04 5 1 Glaux maritima 1.58 7 5 Genista tinctoria 1.67 6 2 Hippocrepis comosa 1.9 3 2 Lathyrus aphaca 6.9 3 4 Lathyrus japonicus 6.34 5 6 Lathyrus nissolia 6.45 6 6 Lathyrus pratensis 4.53 6 5 Lathyrus sylvestris 10.5 4 2 Lotus angustissimus 0.31 3 3 Lotus corniculatus 0.82 4 2 Lotus pedunculatus 0.39 8 4 Lotus subbiflorus 0.81 5 5 Medicago arabica 0.61 5 5 Medicago lupulina 0.9 4 4 Medicago minima 0.58 3 2 Medicago polymorpha 0.56 4 5 Onobrychis viciifolia 1.25 4 3 Ornithopus pinnatus 0.71 3 2 Ononis spinosa 1.43 4 3 Oxytropis campestris 4.56 4 2 Oxytropis halleri 4.61 3 2 Trifolium bocconei 0.9 4 2 55

Trifolium arvense 0.39 3 2 Trifolium campestre 0.37 4 4 Trifolium dubium 0.73 4 5 Trifolium fragiferum 0.54 7 6 Trifolium glomeratum 0.39 3 2 Trifolium ochroleucon 0.6 5 2 Trifolium medium 3.23 4 4 Trifolium ornithopodioides 0.88 6 3 Trifolium pratense 0.43 5 5 Trifolium repens 1.06 5 6 Trifolium scabrum 0.56 3 2 Trifolium striatum 0.38 3 2 Trifolium suffocatum 0.57 4 2 Trifolium subterraneum 0.56 3 2 Ulex europaeus 3.85 5 3 Ulex minor 3.38 6 2 Ulex gallii 2.9 6 2 Vicia bithynica 4.55 4 4 Vicia cracca 2.92 6 5 Vicia hirsuta 3.95 5 6 2.6 3 3 Vicia lutea 7.4 4 5 Vicia orobus 5.3 5 4 Vicia sativa 2.25 4 4 Vicia sepium 4.65 5 6 Vicia sylvatica 8.05 5 5 Vicia tetrasperma 3.6 5 6 Vicia parviflora 2.2 5 5