THESIS

CALIFORNIASTATE UNIVERSITY SAN J\1ARCOS

THESIS SIGNATURE PAGE

THESIS SUBlVIITTEDIN PARTIAL FULFILLMENT OF THE REQUIREMENTSFOR THE DEGREE

MASTER OF

SCIENCE IN CHElVIISTRY

TITLE: Investigation into the Secondary Metabolite Chemistry of crocea Inside and Outside Range

AUTHOR(S): Alyssa Dubord

DATE OF SUCCESSFUL DEFENSE: 05/03/2021

THE THESIS HAS BEEN ACCEPTED BY THE THESIS COMMITTEE IN PARTIAL FULFILLMENT OF THE REQUIREtvlENTS FOR THE DEGREE OF MASTER OF SCIENCE IN CHElVIISTRY

Jackie Trischman 05/10/2021 loc�bhii:JlPDIJ Co:tv1:MITTEE CHAIR SIGNATURE DATE

George Vourlitis 05/12/2021 COMMITTEE MEMBER SIGNATURE DATE

:MichaelSchmidt · cc 1'.N 05/17/2021 -M1Id1ae l3d iiildt(M ay 1,,202110.4HDi, COMMITTEE MEMBER SIGNATURE DATE

COMMITTEE MEMBER SIGNATURE DATE Dubord

Investigation into the secondary metabolite chemistry of

Rhamnus crocea inside and outside Hermes copper range

Alyssa Dubord

Thesis committee:

Jacqueline Trischman (chair), George Vourlitis, Michael Schmidt

Department of Chemistry and Biochemistry

College of Science, Technology, Engineering and Mathematics

California State University San Marcos

333 S. Twin Oaks Road, San Marcos, CA 92096

1 Dubord

TABLE OF CONTENTS

ABSTRACT……………………………………………………………………………………….3

LIST OF TABLES………………………………………………………………………………...4

LIST OF FIGURES……………………………………………………………………………….5

INTRODUCTION………………………………………………………………………………...6

Hermes Copper (Lycaena hermes) & Spiny Redberry (Rhamnus crocea)………………..6

Secondary metabolites…………………………………………………………………….8

Hypothesis………………………………………………………………………………..10

METHODS………………………………………………………………………………………11

Previous work……………………………………………………………………………11

Current work……………………………………………………………………………..12

RESULTS & DISCUSSION ……………………………………………………………………14

Previous work……………………………………………………………………………14

Current work……………………………………………………………………………..15

ACKNOWLEDGEMENTS……………………………………………………………………...45

REFERENCES…………………………………………………………………………………..47

APPENDICIES…………………………………………………………………………………..50

2 Dubord

ABSTRACT

Larvae of the Hermes copper butterfly, Lycaena hermes, are reared on its single host

the spiny red berry, Rhamnus crocea. This plant is rich in flavonoids, two of which are

kaempferol and rhamnocrocin, the latter a novel compound. It was hypothesized that

rhamnocrocin would exist in higher concentrations in R. crocea inside the L. hermes range

compared to outside that range. Using LC-MS/MS analysis, the most abundant compounds within all plant extracts occurred at [M+H] values of 741 m/z, 755 m/z and 317 m/z. Compound

741 was tentatively named rhamnocrocin in a separate structure elucidation project, Compound

755 appears to be a methylated version of rhamnocrocin and Compound 317 has been identified as rhamnetin. A two-sample t-test identified no statistically significant difference for each of these compounds when comparing concentrations in and out of range. All plant extracts were also analyzed for a kaempferol component which appeared at 286 m/z. Approximately 22 different kaempferol compounds were discovered to be contained within the plant extracts. A two-sample t-test, showed statistically significant differences in concentration when comparing kaempferol concentration inside and outside L. hermes range with higher concentrations existing outside. Principal component analysis showed higher average and minimum monthly temperatures and less precipitation in areas in range. Higher precipitation was observed outside range, as were kaempferol concentrations. L. hermes seem to prefer a range that is lower in precipitation, higher in average and minimum monthly temperature, and also lower in kaempferol concentration. These factors may be contributing to why L. hermes is found in such a small habitat compared to its host plant.

3 Dubord

LIST OF TABLES

Table 1. PCA: Compounds (741, 755, 317)………………………………………………...... 26

Table 2. PCA: Compounds & climatic variables………………………………………………...28

Table 3. PCA: Compounds & soil variables……………………………………………………..31

Table 4. PCA: Compounds & foliage variables………………………………………………….33

Table 5. PCA: Kaempferol & climatic variables………………………………………………...36

Table 6. PCA: Kaempferol & soil variables……………………………………………………..37

Table 7. PCA: Kaempferol & foliage variables………………………………………………….39

Table 8. PCA: Kaempferol & Elevation…………………………………………………………41

Table 9. Site names in and out …………………………………………………………………..51

4 Dubord

LIST OF FIGURES

Figure 1. Images of L. hermes and R. crocea …………………………………………………….7

Figure 2. Mapped ranges of L. hermes and R. crocea habitat…………………………………….8

Figure 3. Kaempferol structure…………………………………………………………………..11

Figure 4. Chromatogram and 1H NMR data from old LC-MS/MS method……………………..17

Figure 5. Chromatogram and mass spectral data of revised LC-MS/MS method……………….18

Figure 6. Proposed structure of rhamnocrocin……………………………………………...... 18

Figure 7. Mass spectrum for Rhamnocrocin analogues………..………………………………...19

Figure 8. Chromatogram and mass spectrum from EF-2018-01-2-3-9- ………………….…….20

Figure 9. Overlaid kaempferol chromatograms……………………………………α …………….22

Figure 10. PCA: Compounds (741, 755, 317) …………………………………………………..26

Figure 11. PCA: Compounds & climatic variables……………………………………………...28

Figure 12. PCA: Compounds & soil variables…………………………………………………...30

Figure 13. PCA: Compounds & foliage variables……………………………………………….33

Figure 14. PCA: Kaempferol & climatic variables………………………………………………35

Figure 15. PCA: Kaempferol & soil variables…………………………………………………...37

Figure 16. PCA: Kaempferol & foliage variables……………………………………………….39

Figure 17. PCA: Kaempferol & Elevation……………………………………………………….41

Figure 18. Site maps for R. crocea sampling…………………………………………………….50

5 Dubord

INTRODUCTION

Hermes Copper (Lycaena hermes) & Spiny Redberry (Rhamnus crocea):

The Hermes copper butterfly (Lycaena hermes) is endemic to a small coastal sage scrub

(CSS) region of Southern and northern Baja California, Mexico (Marschalek et al.,

2016; Figure 1A, B). Specifically, this region is located 80 km north of the US-Mexico border,

70 km east of San Diego and a few records extend 160 km south into Baja California, Mexico

(Thorne, 1963, Emmel and Emmel, 1973). Being both a sedentary and specialist species, they are extremely vulnerable to extinction based on their changing environment (Marschalek and

Deutschman, 2008, Thorne, 1963). A large contributor to L. hermes habitat loss is the destruction caused by anthropogenic wildfires. For example, the 2003 Paradise, Cedar, and Mine fires destroyed 39% of remaining L. hermes populations (Hogan, 2006). Urbanization is another threat to natural habitat as San Diego is projected to reach one million new residents by 2030

(Marschalek and Deutschman, 2008). The fragmentation of CSS habitat from both wildfires and urbanization puts L. hermes at a greater risk for extinction. The butterfly is thus classified as vulnerable. Several petitions have been made to list the species under the endangered species act but all have been denied due to insufficient data on biological vulnerability and threats (staff of

Carlsbad Fish and Wildlife, 2006). Today, Hermes copper remains a federal candidate for listing despite its dwindling population.

The spiny redberry (Rhamnus crocea) is the single larval host plant to L. hermes (Figure

1C, D). This plant is found in coastal and is native to Southern and Northern California as well as Baja California and parts of . This ’s distribution far exceeds the habitat of L. hermes, but for unknown reasons the butterfly is sedentary and does not travel outside its

6 Dubord

range to which may seem suitable (Figure 2; Thorne, 1963). Though the Rhamnus genus has been used in various traditional medicines, very little is known about the chemical ecology of the R. crocea species (Mai et al., 2001; Vourlitis, 2018).

Figure 1. The underside of L. hermes wings are bright yellow in color (A.) with the upperside being yellow-orange with black spots (B). R. crocea (C.) is the single larval host plant to this butterfly of which they lay their eggs under new leaf growth (D.) (Deutschman et al., 2011).

7 Dubord

Figure 2. For unknown reasons, the range of L. hermes, shown in orange, (A.) is far smaller

compared to the range of its host plant R. crocea, shown in green (B.). (Deutschman et al., 2011;

Montalvo, 2020).

Secondary metabolites:

Intermediary metabolism consists of enzyme driven reactions which build and convert the organic compounds an organism needs to store metabolic energy. These reactions are

produced through pathways known as metabolic pathways. Primary metabolism combines and

synthesizes such molecules as carbohydrates, proteins, fats and nucleic acids into the primary

metabolites needed to sustain life (Dewick, 2009).

In contrast to primary metabolites, secondary metabolites, often referred to as natural products, are those which exist in a limited distribution throughout organisms. They often give an organism some competitive advantage but are not essential to maintain life. Though the purposes of many natural products are still being discovered, some have been known to produce toxins which ward off predators, volatile compounds that attract the same or other species or

8 Dubord

coloring agents to attract or ward off other species. Natural products are produced from primary

metabolites using enzymes. There are many different natural products that can be constructed

from these building blocks (Dewick, 2009).

Synthesis and abundance of many secondary metabolites can be attributed to

environmental conditions or when organisms are subjected to stresses (Ramakrishna and

Ravishankar, 2011; Yang et al., 2018). Such environmental conditions include light, temperature,

soil salinity and drought. The family has been known to produce characteristic

secondary metabolites such as triterpenes, cyclopeptide alkaloids, benzylisoquinoline alkaloids,

and flavonoids (Alarcón and Cespedes, 2015). This last group, the flavanoids, are the important

group in this investigation.

Through evolution, insects have been programmed to recognize and respond to chemical

signaling from their host plants via differing volatile cues which elicit a specialized response; for

instance, egg laying (Bruce, 2014). Oviposition sites are crucial for Lepidoptera as this choice

affects offspring survival thus impacting population survival (Garcia-Barros and Fartmann,

2009; Reisenman et al., 2010). In addition, larvae have been shown to sequester compounds in the leaves of their host plant to use for their benefit. Butterflies of the Lycaenidae family, which

L. hermes belongs, have been shown to sequester compounds classified as flavonoids (Wiesen, et

al., 1994). This has been suggested to contribute to wing pattern and thus species recognition and

intraspecific visual communication (Wiesen, et al., 1994). Kaempferol 3-O-glucoside was found

as the most abundant flavonoid in larvae, pupae and imagines and accounted for ~83–92% of all

soluble flavonoids in adult butterflies (Wiesen, et al., 1994). Kaempferol is a flavanoid that is

commonly found in edible plants as well as botanical plants used in traditional medicine

(Calderon-Montano et al., 2011) (Figure 3).

9 Dubord

Flavonoids are a subdivision of secondary metabolites categorized as polyphenolics with

the carbon framework C6-C3-C6 (Samanta et al., 2011). Flavonoids are biosynthesized through

the shikimate pathway and citric acid cycle with the precursors phenylalanine and malonyl-CoA

(Samanta et al., 2011). They are located in cell vacuoles of green plants and play an important role in the plant’s interaction with its environment and in protecting them against various biotic and abiotic factors (Samanta et al., 2011).

Hypotheses:

L. hermes stays sedentary in their small fragmented habitat despite having host plants outside their habitat which seem suitable. This fact led to the question: what is causing the sedentary behavior of this butterfly and could the answer exist in the leaf chemistry of its host plant? To better understand the selective reproductive habits of L. hermes, the chemical composition of the R. crocea leaves were investigated (Isbell, 2020). Leaves from R. crocea were harvested from 30 different locations in Southern California (Isbell, 2020). These locations included 15 areas where L. hermes have been known to lay their eggs and 15 sites where they have not (Vourlitis, 2018). This collection was used to determine if the R. crocea leaves in range exhibit different chemical composition or foliage variables from those out of range (Isbell,

2020). Climatic variables and soil variables were also examined to determine if any of these factors may be influencing attraction or deterrence of L. hermes (Isbell, 2020). From this analysis, a newly discovered compound, tentatively named rhamnocrocin, was discovered with the molecular formula C33H40O19. Based on its kaempferol backbone, rhamnocrocin is classified

as a flavone, a sub class of flavonoids (Samanta et al., 2011). It was predicted that the R. crocea

within range of L. hermes will contain differing concentrations of rhamnocrocin compared to the

10 Dubord

R. crocea outside the range. In addition, variation in leaf N, C, lignin, soluble sugar,

holocellulose, solid N and C as well as temperature and precipitation data were also used to

establish correlations to the concentration of rhamnocrocin and other abundant compounds

between sites in and out of range.

Figure 3. Shown here is the chemical structure of kaempferol. This secondary metabolite makes

up the backbone of the rhamnocrocin molecule.

METHODS

Previous work:

Total N, total C, C:N, lignin:N and H2O variables:

In 2018 following L. hermes flight season, ~2-6 R. crocea were sampled from each of 30 different locations in Southern California with the purpose of measuring variables of total N, total C, C:N, lignin:N and H2O content (Vourlitis, 2018). From each shrub, old and new

growth R. crocea leaves and stems were collected as well as surface soil samples (Vourlitis,

2018).

11 Dubord

Chemical analysis:

Dried R. crocea leaf tissue was crushed, weighed and extracted using 75% methanol:25%

dichloromethane submersion for 24 hours in the dark. The supernatant was decanted and

remaining solvent was evaporated using a rotovap. Each extract was then fractionated using isooctane and methanol to partition into polar and nonpolar fractions. The nonpolar isooctane

fraction was analyzed via gas chromatography mass spectrometry (GCMS) and polar extracts

were analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS).

Current work:

Solvent was removed in vacuo for all purified plant extracts, then the dried extract was weighed and dissolved in sufficient LCMS grade acetonitrile to reach a concentration of 3 mg/mL.

LC-MS/MS:

The electrospray ionization source of the Agilent 6410 triple quadrupole mass spectrometer used in this study was operated in positive ion mode. Nitrogen nebulizing gas temperature was set to 300 °C. An injection volume of 3 L and gradient mobile phase mixture composed of acetonitrile with 0.1% formic acid /water withμ 0.1% formic acid were used for all acquisitions. Data were acquired for 25 min with a flow rate of 0.300 mL/min. For scan mode, an

Agilent ZORBAX SB-C18 1.8 m 2.1x50 mm column was installed. For product ion mode

(PIM) and multiple-reaction monitoringμ (MRM), a new column was installed. Both columns

were left at ambient temperature and had flow rates of 0.300 mL/min.

12 Dubord

The mass spectrometer parameters varied depending on the acquisition mode. In scan mode, a mass range of 50-1200 m/z was scanned over a 500 ms time period. For PIM scans, a mass range of 100-800 m/z for Plant 741, Plant 755, Plant 777 and 100-400 m/z for Plant 317 was scanned over a 500 ms time period; collision energies were set at 30, 10, 0 and 50 for Plant

317, Plant 741, Plant 755 and Plant 777, respectively. The same collision energies were used in

MRM mode with 500 ms dwell times for each precursor → product ion transition. In SIM mode, mass range of 50-400 m/z was scanned over 500 ms time period for kaempferol at 287 m/z and kaempferol fragment at 286 m/z.

HPLC:

Attached to the quadrupole was an Agilent Technologies 1260 Infinity HPLC. The data analysis in the method used began at 0.1 min. The mobile phase consisted of solvent A (0.1% formic acid:water) and solvent B (0.1% formic acid:Acetonitrile). The gradient elution was as follows: 10-20% B for 0.1-2 min; 20-95% B for 2-7 min; 95-99% B for 7-12min; 99-5% B for

12-14 min and 5-10% B for 14-15 min; and re-equilibrating for 15-25 min. The UV detection wavelengths were recorded at 254 nm and 215 nm.

Statistics:

Statistical tests were performed via the statistical package Minitab 18. Assumptions of samples being independent and randomly obtained were met. Using an outlier test, values greater than 1.5 times the interquartile range were constituted as outliers and were removed from the data set. The test for normality and homogeneity of variance (HOV) were performed prior to performing statistical test for significance. HOV used the Levene’s test. A two-sample t-test was

13 Dubord

performed to determine if there was a difference in compound concentrations between ranges.

The null hypothesis was that samples within L. hermes range compared to outside the range had

no difference in compound concentration. When p<0.05, the null hypothesis was accepted. When

p>0.05, the null hypothesis was rejected and the alternate hypothesis was retained.

To further analyze the data, this study employed principal component analysis (PCA)

plots generated using the programming language R and integrated development environment

Rstudio. PCA is a dimensionality reduction technique which emphasizes variation and pattern to

orient multiple dimensions within a data set into a 2-dimensional plot (Powell, 2021; Karkare,

2021). The axes, called “principal components,” are assembled through eigenvalues and

eigenvectors. They do not mean anything physically, but are a combination of the data points in

question (Powell, 2021). Each transformation creates the horizontal axis “PC1” to contain the

most variation and the vertical axis “PC2” to contain the second-most variation (Powell, 2021).

The PC3 axis and above typically have the least amount of variation and are therefore dropped from a graphical representation (Powell, 2021). The closer the arrow is to the origin, the higher the magnitude of the contribution of that variable to the principal component (Fei et al., 2017).

RESULTS & DISCUSSION

Previous work:

Previous extraction work revealed several soluble C-based compounds that were in large quantities in R. crocea leaves using LC-MS/MS and GC/MS. Further analysis with LC-MS/MS found that of the 49 compounds seen in this study, 10 compounds were shown to be present at statistically different concentrations between sites (Isbell, 2020). Three of the 10 compounds

14 Dubord

seen may be alkaloids: alkaloid 1, alkaloid 2 and alkaloid 3 (Isbell, 2020). Alkaloid 1 and

alkaloid 3 were found in higher concentrations within L. hermes range with alkaloid 2 found

higher outside range (Isbell, 2020). Compounds that appear to be terpenes and iridoids were seen

in higher concentrations outside the L. hermes habitat (Isbell, 2020). Two sugars were also seen: glu 1 and glu 2. Glu 1 was found in higher concentrations outside range and glu 2 was found in higher concentrations within range. Flavonoids similar to rhamnocrocin were seen in significantly higher quantities outside L. hermes habitat (Isbell, 2020). Tocopherol was found in significantly higher quantities within range (Isbell, 2020). Compounds that appeared to be peptides were also measured in higher concentrations outside of the range (Isbell, 2020).

Current work:

LC-MS/MS:

The previous work contained discrepancies between LC-MS/MS and 1H NMR data. LC-

MS/MS data showed multiple peaks of several different compound classes with no major peak, indicating a sample with many components (Figure 4A). However, 1H NMR data revealed the

same sample to be a near pure compound or small set of similar compounds all containing a

kaempferol backbone (Figure 4B). These results initiated a close examination and revision of the

LC-MS/MS method. Most Rhamnus species contain metabolites with formula weights below 600 amu. However, when looking at the 13C NMR, there were likely three glycosides attached to the

kaempferol backbone, rather than the typical two sugars moieties. This called for an expansion of

the m/z range in the LC-MS/MS method. This new method revealed a major peak, very polar in

nature, which eluted from the column with a retention time (Rt) less than 1 minute (Figure 5A).

This major peak exhibited a parent fragment at 741 m/z (Figure 5B). This corresponded to a

15 Dubord

compound with a molecular weight of 740 amu and corresponded to the pure structure found in

1H NMR. After extensive literature search with no reported findings, it was deemed that the

identified compound was novel and has been tentatively named rhamnocrocin (Figure 6). Based

on its kaempferol backbone, rhamnocrocin is classified as a flavone which is a further sub class

of a flavonoids (Samanta et al., 2011; Figure 3B). In this compound, the hydroxyl group in

kaempferol is replaced by a glycosidic linkage which attaches three consecutiveβ glycoside

groups (Figure 6). Flavonoids most commonly exist glycosylated, thus containing many

hydroxyl groups which makes them fairly water soluble. They also exist commonly with

multiple methyl groups and isopentyl units which can make them substantially more lipophilic

(Samanta et al., 2011).

Further LC-MS/MS with all plant extracts revealed rhamnocrocin and two analogues of

rhamnocrocin in amounts large enough to be identified. The actual rhamnocrocin molecule had a

tR 0.598-0.740 min (Figure 7A). This was identified by analysis of a pure sample of rhamnocrocin≈ named EF-2018-01-2-3-9- after a series of 3 purification steps. A second compound, clearly related to rhamnocrocinα had a t 6.288-7.466 min (Figure 7B). Being that this form has the same molecular weight but a different≈ retention time indicates that this form is an isomer that differs in polarity causing its elution from the column at a later time. The third analog of rhamnocrocin has a parent fragment of 755 with a t 7.667-8.673 min (Figure 7C). Having a molecular weight with a difference of 14 corresponds ≈to the addition of a methyl group to the molecule and is thus the methylated form of rhamnocrocin (Figure 7C). This was confirmed with

1H NMR data of various fractions in the large sample work up.

16 Dubord

Figure 4. The chromatogram from the original LC-MS/MS method contained several peaks of interest indicating the presence of multiple compounds within the plant extract. However, 1H

NMR spectra (B.) indicated only one pure compound. These instrumental discrepancies lead to

LC-MS/MS method revision.

17 Dubord

Figure 5. The chromatogram from the revised method (A.) showed one strong peak representative of a single compound or group of closely related compounds that had been suggested by 1H NMR. MS results in scan mode showed a parent fragment at 741 amu corresponding to the rhamnocrocin molecule (B.)

Figure 6. The proposed structure of the novel rhamnocrocin molecule determined by 2D NMR.

18 Dubord

Figure 7. So far, there have been three analogs of rhamnocrocin discovered: the original, its isomer and a methylated version, all with differing retention times and parent ions at 741 (A.),

741 (B.) and 755 (C), respectively.

19 Dubord

Figure 8. To ensure the tR of the original rhamnocrocin, a pure sample obtained by HPLC called

EF-2018-01-2-3-9- , was run on LC-MS/MS. The resulting chromatogram (A.) showed

rhamnocrocin to haveα a tR 0.598-0.740. The corresponding mass spectrum (B.) showed a parent

ion at 741.3 m/z indicating≈ a molecular weight of 740 m/z.

A mass analyzer of the triple quadrupole MS/MS like the one used in this study can be

thought of as two moving belts stacked and parallel to each other with a collision cell in the middle. The top belt, representing the first quadrupole, is fixed to select the precursor ions which travel to the collision cell to be fragmented. The collision cell is a hexapole with six rods and

nitrogen as the collision gas. Varying voltages are applied to both the collision cell and the

20 Dubord

quadrupoles to cause the movement of the product ion to the third quadrupole. The fragments are

filtered through the third quadrupole resulting in a product-ion scan MS/MS representing a

compound’s fingerprint. The selected reaction monitoring (SRM) mode is used when specific

precursor ion and specific product ion are monitored. Multiple SRMs run for the same precursor

ion, this is called multiple reaction monitoring (MRM) (Agilent technologies, 2012).

Using the mass spectra of all extracts obtained by scan mode, the most common peaks

throughout the samples were investigated. The most common compounds throughout all samples

had parent ions at 317 m/z, 741 m/z, 755 m/z and 777 m/z. Ions 741 and 755 correspond to the

rhamnocrocin molecule and its methylated form, respectively. Each 755 ion peak was

accompanied by a 777 ion peak. However, the 755 m/z peak was always strong, and the 777 m/z peak was always weaker. When comparing the fragmentation patterns of 755 and 777, they were almost identical. This means that 777 most likely has the rhamnocrocin backbone with something added to it. There is a loss of 22 m/z from 777 to 755 meaning whatever was adding would have this mass. With this weight, one possibility is that sodium may be present on the 777 form of rhamnocrocin. Sodium is a common contaminant in positive ESI mass spectrometry and typically has origins from buffers used in other MS work or contaminated solvents/glassware. To check this, a scan mode from 100-1000 m/z was performed in negative ion mode. In the resulting

chromatogram, the 777 m/z peak disappeared which strongly suggested that this peak

corresponded to a sodium contaminant. If this is true, all 777 peaks correspond to the same

organic constituent as the 755 peaks. The areas for 777 peaks were therefore added to the 755

peaks areas prior to statistical analysis. The 317 compound is most likely rhamnetin, a known

compound in the Rhamnaceae genus. It is a flavonol with the molecular formula of C16H12O7.

21 Dubord

Using SIM mode, each plant extract was also analyzed for the kaempferol fragment

which appeared at 286 and kaempferol molecule at 287 amu. Resulting chromatograms had 22

different retention times for kaempferol-containing peaks (Figure 9). This indicated that there are at least 22 different components containing kaempferol existing in the R. crocea plant extracts.

These could correspond to a multitude of compounds, which like rhamnocrocin, have kaempferol

as a backbone. Plant biosynthesis is performed through reactions using enzymes which create

new compounds by adding different functional groups onto already existing molecules. Natural

products can be biosynthesized through the combination of several building blocks obtained

from primary metabolites or by using a mixture of different building blocks (Dewick, 2009).

These combinations result in vast amounts of structural diversity which can be seen in the

overlaid chromatograms of the varying kaempferol containing compounds in Figure 9.

Figure 9. Overlaid kaempferol chromatograms, each color representing a different sample

extract. Different retention times among peaks correspond to different kaempferol containing

compounds.

Statistics:

Abundance was measured as the integration of the ion current at the specified mass over

time. The measured abundance of peak 741, corresponding to rhamnocrocin, from all in-range

22 Dubord

sample extracts and out-of-range extracts, appeared to be normally distributed (p-value=0.186).

HOV was passed meaning equal variances between in and out regions (p-value=0.984). Two-

sample t-test results showed a p-value>0.05 meaning the null hypothesis was retained (p-

value=0.663).

The measured abundance for compound 755, the methylated rhamnocrocin molecule,

passed the assumption of normality (p-value=0.441) and passed HOV (p=0.595) meaning equal

variances. A two-sample t-test resulted in a p-value>0.05 meaning the null hypothesis was

retained (p-value=0.765).

The measured abundance for compound 317, passed both tests for normality and HOV

(p-value=0.139 and p-value=0.811, respectively). The two-sample t-test gave a p-value>0.05

meaning the null hypothesis was retained (p-value=0.505).

The kaempferol fragment (at 286 m/z) did not come from a normal distribution (p-

value=0.033). HOV was not passed meaning unequal variances (p-value=0.008). Because of this,

for the two-sample t-test, a Welch’s t-test was performed. This test gave a p-value<0.05 meaning

the null hypothesis was rejected (p-value=0.015).

Compounds 741, 755 and 317 all had p-values>0.05. This means there was no statistical

difference in rhamnocrocin concentrations (compounds 741), methylated rhamnocrocin

concentrations (compound 755) or rhamnetin concentrations (compound 317) when comparing

R. crocea leaf extracts in and out of L. hermes range. However, rejection of the null hypothesis indicates statistically different kaempferol fragment concentrations inside and outside of L. hermes range. Average total area values of 2.54x106 and 6.50x106 relative to chromatographic

peak area for in and out of range, respectively, show that kaempferol compound concentrations

are higher within R. crocea plants outside the range of L. hermes.

23 Dubord

PCA:

Previously obtained measurements of climate, soil and foliage variables were examined

to better understand when and why compounds 741, 755, 317 and kaempferol may be

biosynthesized. Climate variables included average monthly temperature, maximum average

monthly temperature, minimum average monthly temperature and total precipitation.

Soil variables included estimated N deposition, pH, total inorganic N, total N, total C and ratio of C:N. Nitrogen deposition is the input of N from the atmosphere and because many terrestrial ecosystems are N limited they may benefit from additional N inputs in increasing

biomass production (Stevens, et al., 2018) However, too much N can cause toxicity to occur

(Stevens, et al., 2018). Both organic and inorganic forms of N occur in the soil (Li, et al., 2014).

+ - The main inorganic forms are ammonium N (NH4 -N) and nitrate N (NO3 -N), necessary for

plant direct uptake and biomass production (Li, et al., 2014). Soil organic C and total N are used

for estimating soil quality through their C:N ratio (Zhijing, and Shaoshan, 2018). Soil pH

directly controls factors such as microorganism activity, nutrient solubility and availability

(Gentili, et al., 2018). In acidic soils, micronutrients are more available to plants but can become

toxic when in excess (Gentili, et al., 2018). Alkaline soils increased availability for

macronutrients but phosphorus and micronutrient availability are reduced (Gentili, et al., 2018).

Foliage variables included leaf N, leaf C, leaf C:N, lignin, and holocellulose. Leaf N, C

and C:N ratio also tell us about the health and composition of the foliage. Carbon and N are

critical for plant functions such as energy flow and nutrient cycling (Zhang, et al., 2019). A high

C:N is an indicator for N use efficiency while foliage with a low C:N typically are decomposed

faster by microbes, cycling the N back into the ecosystem (Zhang, et al., 2019). According to

Rowell, et al., lignins are amorphous, with their structure typically containing aromatic polymers

24 Dubord of phenylpropane units. They serve as encrusting agents in the cellulose/hemicellulose matrix and act as a type of adhesive for the plant cell wall. Holocellulose is the combination of cellulose

(glucan polymer of D-glucopyranose units linked by glucosidic bonds) and hemicellulose

(multiple sugars units containing polysaccharide polymers).β Holocellulose makes up the carbohydrate portion of most plants and accounts for ~65-70% of the plants dry weight. Their chemical structure is made up of sugars with many hydroxyl groups which aid the plant in absorbing moisture via hydrogen bonding.

Figure 10 contains a PCA of Compounds 741, 755 and 317. The proportion of variation explained in the data by these compounds is 52.36% and 30.37% for PC1 and PC2, respectively, with a cumulative explained proportion of 52.4% and 82.7%, respectively (Figure 10, Table 1).

Pearson correlation coefficients showed PC1 best explains Compound 755 and Compound 317: these values increase with decreasing PC1 (Table 1). PC2 best explains Compound 741: this value increases with increasing PC2 (Table 1). Compound 317 is inversely related to Compound

755 meaning when one concentration goes up, the other goes down. Compound 741 and 317 have a weak but positive correlation. Compound 741 appears to be at a right angle to compound

755 showing no correlation, including no apparent decrease in rhamnocrocin concentration when more methylated rhamnocrocin is found. There is no clear clustering of data points pertaining to in and out of range as shown in red and blue, respectively.

25 Dubord

Figure 10. Variation of Compounds 741, 755 and 317 shown through PCA. There is no clear clustering for data points in and out of range. Compound 317 (Comp 317) and Compound 741

(Comp 741) have a weak but positive correlation. Compound 317 is negatively correlated to

Compound 755 (Comp 755). Compound 755 and 741 appear to be at a right angle showing no correlation to each other. PC1 and PC2 combined explain a total of 82.73% of the variation in the data.

Table 1. Eigenvalues and percentages of explained variability (A). Pearson correlation coefficients between compounds for the first two principal components (PC) (B.). PC1 best explains Compound 755 and Compound 317: these values increase with decreasing PC1. PC2 best explains Compound 741: these values increase with increasing PC2.

PC Eigenvalue Proportion Cumulative 1 1.5707 0.524 0.524 2 0.9110 0.304 0.827 3 0.5184 0.173 1.000 A.

26 Dubord

Parameter/PC PC1 PC2 Compound 741 0.406 -0.886 Compound 755 -0.619 -0.446 Compound 317 0.672 0.124

Figure 11 contains a PCA of Compounds 741, 755, 317 and climatic variables of average

monthly temperature (Ave. T_C (Wt.)), maximum average monthly temperature (Max. T_C

(Wt.)), minimum average monthly temperature (Min. T_C (Wt.)) and total precipitation

(Total_mm (Wt.)). PC1 explains 42.45% of variance extracted from the data set while PC2 explains 24.16% (Figure 11, Table 2). The cumulative explained proportion for PC1 and PC2 were 42.4% and 66.6%, respectively (Table 2). Pearson correlation coefficients showed PC1 best explains Compound 741, Ave T_C (Wt), Min T_C (Wt.) and Total mm (Wt.): these values increase with decreasing PC1 (Table 2A). PC2 best explains Compound 755, Compound 317 and

Max. T_C (Wt.): these values increase with increasing PC2 (Table 2). There is clear clustering of leaf samples obtained from inside L. hermes habitat and those obtained outside the habitat. The

left side of the plot, containing average temperature and minimum temperature, contains mainly

samples obtained from inside L. hermes range. The right side of the plot, containing total precipitation, contains mainly samples obtained from outside the range. This means that many of the samples in range were found in a cooler environment with less precipitation. Compound 755 was not associated with areas containing higher average maximum monthly temperatures while

Compound 741 and 317 had a weak but positive correlation.

27 Dubord

Figure 11. PCA with climate variables of Minimum average temperature in C° (Min. T_C

(Wt.)), Average monthly temperature in C° (Ave. T_C (Wt.)), Compound 755 (Comp 755), Total precipitation in mm (Total_mm (Wt.)), Compound 741 (Comp 741), Maximum average monthly temperature in C° (Max. T_C (Wt.)), and Compound 317 (Comp 317). There is clear clustering between plant samples taken from in and out of range. Precipitation is higher outside of L. hermes range while minimum and average monthly temperatures are higher inside the range.

Compounds 741 and 317 are positively correlated with maximum temperatures while compound

755 is negatively correlated. The combined PC1 and PC2 axes represent a total of 66.61% variation in the data.

Table 2. Eigenvalues and percentages of explained variability (A). Pearson correlation coefficients between compounds for the first two principal components (PC) (B.). PC1 best explains Compound 741, Ave T_C (Wt), Min T_C (Wt.) and Total mm (Wt.): these values

28 Dubord increase with decreasing PC1. PC2 best explains Compound 755, Compound 317, Max. T_C

(Wt.): these values increase with increasing PC2.

PC Eigenvalue Proportion Cumulative 1 2.9712 0.424 0.424 2 1.6915 0.242 0.666 3 1.0627 0.152 0.818 4 0.7166 0.102 0.920 5 0.5021 0.072 0.992 6 0.0526 0.008 1.000 7 0.0033 0.000 1.000

Parameter/PC PC1 PC2 Compound 741 0.042 -0.288 Compound 755 -0.002 0.619 Compound 317 0.084 -0.619 Ave. T_C (Wt.) -0.568 -0.120 Max. T_C (Wt.) 0.147 -0.362 Min. T_C (Wt.) -0.572 -0.029 Total_mm (Wt.) 0.566 0.061

Figure 12 contains a PCA of Compounds 741, 755, 317 and soil variables of estimated N deposition (N_Dep (kgN/ha)), pH, total organic nitrogen (TIN_1(gN/m2)), total N (Total_N

(gN/m2)), total C (Total_C (gN/m2)) and ratio of C:N (Soil C/N). PC1 explains 29.89% of variance extracted from the data set while PC2 explains 20.35% (Figure 12). PC1 and PC2 had a cumulative proportion of 29.9% and 50.2%, respectively (Table 3A). Pearson correlation coefficients showed PC1 best explains N_Dep (kgN/ha), pH, TIN_1 (gN/m2), Total_C (gN/m2) and Soil C:N: these values increase with decreasing PC1. PC2 best explains Compound 741,

Compound 755, Compound 317, and Total_N (gN/m2): these values increase with increasing

PC2 (Table 3). Though there is overlap of the in and out of range data points, a general trend can be deduced for all plant samples. The right side of the PCA lists total inorganic N, ratio of C/N and total C. Data points on this side of the PCA can be assumed to contain higher amounts of

29 Dubord these soil variables. Compound 317 is also on this side of the PCA meaning for this compound to be biosynthesized, the plant may need to experience higher total inorganic N, ratio of C:N and total C. The left side of the PCA lists soil variables, pH, N deposition and total N which exist in higher numbers for these plant samples. Compound 741 and pH are positively correlated meaning this compound may exist in more alkaline soils. Compound 755 and total N are positively correlated which means 755 may be biosynthesized in soils with high N content.

Compound 317 is inversely related to Total N meaning it may need soils lower in N content for biosynthesis.

Figure 12. PCA with soil variables of N deposition (N_Dep (kgN/ha)), Total N (Total_N

(gN/m2)), Compound 755 (Comp 755), Total C (Total_C (gN/m2)), Soil C:N ratio (Soil C/N),

Total inorganic N (TIN_1 (gN/m2)), Compound 317 (Comp 317), Compound 741 (Comp 741) and pH (overlapped with Comp 741). There is no clear clustering between samples in and out of

30 Dubord range. Compound 755 is positively correlated with total N while compound 317 is negatively correlated. Compound 741 is positively correlated with soil pH but negatively correlated with total C. Soil C:N is positively correlated with total inorganic N but both are negatively correlated to N deposition. The combined PC1 and PC2 axes represent a total of 50.24% variation in the data.

Table 3. Eigenvalues and percentages of explained variability (A.). Pearson correlation coefficients between compounds and soil variables for the first two principal components (PC)

(B.). PC1 best explains Compound N_Dep (kgN/ha), pH TIN_1 (gN/m2), Total_C (gN/m2) and

Soil C/N): these values increase with decreasing PC1. PC2 best explains Compound 741,

Compound 755, Compound 317, and Total_N (gN/m2): these values increase with increasing

PC2.

PC Eigenvalue Proportion Cumulative 1 2.6898 0.299 0.299 2 1.8315 0.204 0.502 3 1.3947 0.155 0.657 4 1.1505 0.128 0.785 5 0.7749 0.089 0.871 6 0.5568 0.062 0.933 7 0.3158 0.035 0.968 8 0.2646 0.029 0.998 9 0.0213 0.002 1.000

Parameter/PC PC1 PC2 Compound 741 0.182 -0.262 Compound 755 0.049 0.475 Compound 317 -0.106 -0.598 N_Dep (kgN/ha) 0.351 0.141 pH 0.266 -0.260 TIN_1 (gN/m2) -0.501 -0.129 Total_N (gN/m2) 0.114 0.404 Total_C (gN/m2) -0.471 0.284 Soil C/N -0.523 -0.009

31 Dubord

Figure 13 contains a PCA of Compounds 741, 755, 317 and foliage variables leaf N

(Leaf_N (%)), leaf C (Leaf_C (%)), leaf C:N (Leaf_CN(%)), lignin (Lignin (%)) and holocellulose (Holo (%)). PC1 explains 27.51% of variance extracted from the data set while

PC2 explains 23.18% (Figure 13). PC1 and PC2 had cumulative proportions of 27.5% and

50.7%, respectively (Table 4A). Pearson correlation coefficients showed PC1 best explains Leaf

N (%), Leaf C (%), Leaf_CN (%): these values increase with decreasing PC1. PC2 best explains

Compound 741, Compound 755, Compound 317, Lignin (%) and Holo (%): these values increase with increasing PC2. (Table 4). This PCA also does not have clear clustering for data points obtained from samples in and out or range but does show some general trends for all plant extracts. The left side of the PCA lists leaf C:N (Leaf_CN (%)) and lignin %. Since Compound

755 and lignin are overlapping, they are positively correlated and presumably, foliage must be high in lignin for 755 production. The right side of the PCA indicates plant extracts with foliage containing higher Leaf N, Leaf C and holocellulose. Compound 317 and 741 are inversely related to lignin and may be biosynthesized when these conditions are low. Compound 755 is inversely related to holocellulose while 741 and 317 are positively correlated to this variable.

32 Dubord

Figure 13. PCA with foliage variables of percentage of leaf C:N ratio (Leaf_CN (%)),

Compound 755 (Comp 755), Lignin percentage (overlapped with Comp 755), Leaf C percentage

(Leaf_C (%)), Leaf N percentage (Leaf_N (%)), Holocellulose percentage (Holo (%)),

Compound 741 (Comp 741) and Compound 317 (Comp 317). Lignin and Compound 755 are positively correlated. They are both opposite from % holocellulose meaning they are negatively correlated to this variable. Compound 741 and 317 appear to be negatively correlated with lignin

% and may be positively correlated to holocellulose %. Leaf N % and leaf C % are positively correlated while both are negatively correlated to Leaf C:N %. The combined PC1 and PC2 axes represent a total of 50.69% variation in the data.

Table 4. Eigenvalues and proportions of explained variability (A.). Pearson correlation coefficients between compounds and foliage variables for the first two principal components

33 Dubord

(PC) (B.). PC1 best explains Leaf N (%), Leaf C (%), Leaf_CN (%): these values increase with decreasing PC1. PC2 best explains Compound 741, Compound 755, Compound 317, Lignin (%) and Holo (%): these values increase with increasing PC2.

PC Eigenvalue Proportion Cumulative 1 2.2010 0.275 0.275 2 1.8546 0.232 0.507 3 1.2801 0.160 0.667 4 0.9547 0.119 0.786 5 0.8206 0.103 0.889 6 0.5525 0.069 0.958 7 0.3204 0.040 0.998 8 0.0160 0.002 1.000

Parameter/PC PC1 PC2 Compound 741 0.044 -0.159 Compound 755 -0.282 0.367 Compound 317 0.088 -0.607 Leaf_N (%) 0.598 0.318 Leaf_C (%) 0.277 0.111 Leaf_CN (%) -0.504 -0.346 Lignin (%) -0.342 0.353 Holo (%) 0.327 -0.338

PCAs containing kaempferol were done separately from compounds 741, 755 and 317

because kaempferol showed statistically different concentrations in and out of range. Figure 14

contains a PCA of Kaempferol and climatic variables via PCA. PC1 explains 62.22% of variance

extracted from the data set while PC2 explains 22.56% (Figure 14). PC1 and PC2 have

cumulative proportions of 25.8% and 45.3%, respectively. Pearson correlation coefficients

showed PC1 best explains Ave. T_C (Wt.), Min. T_C (Wt.), total_mm (Wt.): these values

increase with decreasing PC1. PC2 best explains Kaempferol fragment, Max.T_C (Wt.): these

values increase with increasing PC2. These data points do contain clustering for in and out of

range leaf extracts. Similar to compounds and climatic variables, samples within range are found

34 Dubord in areas with a higher average temperature and higher minimum temperature. Samples out of range are found in areas with more precipitation and contain higher concentrations of kaempferol. Kaempferol as well as precipitation are inversely related to average temperature.

This would argue that kaempferol is more commonly synthesized under conditions involving higher precipitation.

Figure 14. PCA with climate variables of Minimum average temperature in C° (Min. T_C

(Wt.)), Average monthly temperature in C° (Ave. T_C (Wt.)), Maximum average monthly temperature in C° (Max. T_C (Wt.)), Total precipitation in mm (Total_mm (Wt.)) and kaempferol. There is clear clustering between samples taken in and out of range. Kaempferol is found in higher abundance outside of range. It is positively correlated with total precipitation but negatively correlated with average minimum temperature. The combined PC1 and PC2 axes represent a total of 84.78% variation in the data.

35 Dubord

Table 5. Eigenvalues and percentages of explained variability (A.). Pearson correlation coefficients between kaempferol fragment and climatic variables for the first two principal components (PC) (B.). PC1 best explains Ave. T_C (Wt.), Min. T_C (Wt.), total_mm (Wt.): these values increase with decreasing PC1. PC2 best explains Kaempferol fragment, Max.T_C

(Wt.): these values increase with increasing PC2.

PC Eigenvalue Proportion Cumulative 1 3.1108 0.622 0.622 2 1.1280 0.226 0.848 3 0.6842 0.137 0.985 4 0.0675 0.014 0.998 5 0.0094 0.002 1.000

Parameter/PC PC1 PC2 Kaempferol fragment -0.311 0.427 Ave. T_C (Wt.) 0.556 -0.001 Max. T_C (Wt.) 0.012 -0.875 Min. T_C (Wt.) 0.542 0.223 Total_mm (Wt.) -0.548 -0.042

Figure 15 contains a PCA of Kaempferol and soil variables via PCA. PC1 explains

40.55% of variance extracted from the data set while PC2 explains 23.04% (Figure 15). PC1 and

PC2 contain cumulative proportions of 40.5% and 63.6%, respectively (Table 6A). Pearson correlation coefficients showed PC1 best explains N_Dep (kgN/ha), pH, TIN_1 (gN/m2), Soil

C/N: these values increase with decreasing PC1. PC2 best explains Kaempferol fragment,

Total_N (gN/m2), Total_C (gN/m2), Soil C/N): these values increase with increasing PC2. There is heavy overlap of data points in and out of range. The trend for all plant extracts is that kaempferol is inversely correlated with soil C/N ratio and more positively correlated to nitrogen deposition. Thus, it would seem kaempferol is more commonly biosynthesized in areas with high soil N which implies higher N availability.

36 Dubord

Figure 15. PCA with soil variables of Soil C:N ratio (Soil C/N), Total inorganic N (TIN_1

(gN/m2)), Total C (Total_C (gN/m2)), Total N (Total_N (gN/m2)), Kaempferol, pH, and N deposition (N_Dep (kgN/ha)). Kaempferol may be found in higher abundance in areas with higher total N, pH and N deposition. It is negatively correlated to soil C:N but shows no correlation between total inorganic N and total C. The combined PC1 and PC2 axes represent a total of 63.59% variation in the data.

Table 6. Eigenvalues and percentages of explained variability (A.). Pearson correlation coefficients between kaempferol fragment and soil variables for the first two principal components (PC) (B.). PC1 best explains N_Dep (kgN/ha), pH, TIN_1 (gN/m2), Soil C/N: these values increase with decreasing PC1. PC2 best explains Kaempferol fragment, Total_N (gN/m2),

Total_C (gN/m2), Soil C/N): these values increase with increasing PC2.

37 Dubord

PC Eigenvalue Proportion Cumulative 1 2.8382 0.405 0.405 2 1.6130 0.230 0.636 3 1.0676 0.153 0.788 4 0.7986 0.114 0.902 5 0.3472 0.050 0.952 6 0.2588 0.037 0.989 7 0.0766 0.011 1.000

Parameter/PC PC1 PC2 Kaempferol fragment 0.330 0.408 N_Dep (kgN/ha) 0.360 -0.153 pH 0.286 0.021 TIN_1 (gN/m2) -0.490 0.260 Total_N (gN/m2) 0.191 0.673 Total_C (gN/m2) -0.424 0.479 Soil C/N -0.472 -0.246

Figure 16 contains a PCA of Kaempferol and foliage variables via PCA. PC1 explains

41.96% of variance extracted from the data set while PC2 explains 25.65% (Figure 16). PC1 and

PC2 have a cumulative proportion of 42.0% and 67.6%, respectively (Table 7A). Pearson

correlation coefficients showed PC1 best explains Leaf N (%), Leaf_C (%), Leaf_CN (%): these

values increase with decreasing PC1. PC2 best explains the kaempferol fragment, Lignin (%)

and Holo (%): these values increase with increasing PC2. Though there is overlap between

samples in and out of range, the general trend shows kaempferol and leaf N % to have a negative correlation to ratio of leaf C/N %. Lignin % is negatively correlated to Holocellulose % but positively correlated to leaf C %.

38 Dubord

Figure 16. PCA with foliage variables of Leaf C percentage (Leaf_C (%)), Lignin percentage,

Kaempferol, percentage of leaf C:N ratio (Leaf_CN (%)), Leaf N percentage (Leaf_N (%)) and

Holocellulose percentage (Holo (%)) and kaempferol. Kaempferol is negatively corelated to leaf

C:N and may be positively correlated with lignin %. It may also be found in higher abundance with higher leaf N, leaf C and holocellulose %. The combined PC1 and PC2 axes represent a total of 67.61% variation in the data.

Table 7. Eigenvalues and percentages of explained variability (A.). Pearson correlation coefficients between kaempferol fragment and foliage variables for the first two principal components (PC) (B.). PC1 best explains Leaf N (%), Leaf_C (%), Leaf_CN (%): these values increase with decreasing PC1. PC2 best explains the kaempferol fragment, Lignin (%) and Holo

(%): these values increase with increasing PC2.

39 Dubord

PC Eigenvalue Proportion Cumulative 1 2.5178 0.420 0.420 2 1.5389 0.256 0.676 3 0.8556 0.143 0.819 4 0.6333 0.106 0.924 5 0.4331 0.072 0.996 6 0.0213 0.004 1.000

Parameter/PC PC1 PC2 Kaempferol fragment 0.191 0.620 Leaf_N (%) 0.616 0.091 Leaf_C (%) 0.420 -0.290 Leaf_CN (%) -0.541 -0.255 Lignin (%) -0.152 0.590 Holo (%) 0.304 -0.331

Lastly, it occurred to check the kaempferol-containing compounds against altitude as R. crocea plants collected outside of range grew at a higher elevation. Figure 17 shows a PCA with kaempferol and elevation. R. crocea plants outside L. hermes range grew at an elevation that was significantly higher in elevation outside L. hermes range compared to inside (p-value=0.012).

PC1 explains 51.63% of variance extracted from the data set while PC2 explains 48.37% (Figure

17). PC1 and PC2 have a cumulative proportion of 51.3% and 100%, respectively (Table 8A).

Clustering exists between samples in range and samples outside of range. Kaempferol and elevation are both pointed toward the samples outside of L. hermes range which may indicate that these two variables are increased for samples outside the range.

40 Dubord

Figure 17. A PCA of kaempferol and elevation. There is clear clustering of data points in range as well as out of range. Both arrows for kaempferol and elevation are pointing toward out of range implying that the R. crocea plants out of range contain higher concentration of kaempferol and grew at a higher elevation. PC1 and PC2 combined explain 100% of the variation in the data.

Table 8. Eigenvalues and percentages of explained variability (A.). Pearson correlation coefficients between kaempferol fragment and foliage variables for the first two principal components (PC) (B.).

PC Eigenvalue Proportion Cumulative 1 1.0255 0.513 0.513 2 0.9745 0.487 1.000

Parameter/PC PC1 PC2 Elevation 0.707 -0.707 Kaempferol fragment -0.707 -0.707

41 Dubord

Points of Interest

A way R. crocea is presumably communicating with L. hermes is through plant volatiles.

Nonconjugated molecules have the ability to cross membranes and evaporate into the atmosphere

(Pichersky, et al., 2006). Free volatiles are most likely to accumulate in membranes and are in

some cases glycosylated then stored in vacuoles (Pichersky, et al., 2006). The structure of

rhamnocrocin contains three glycosides. The methyl groups on the methylated form of

rhamnocrocin cause the molecule to be slightly lipophilic. Lipophilicy and a high vapor pressure

is a common characteristic of plant volatiles (Pichersky, et al., 2006). Being glycosylated and

lipophilic, the methylated form of rhamnocrocin slightly fits the description of the common

structure of a plant volatile. However, having such a large molecular weight and being so polar,

it is unlikely to be a plant volatile. Biosynthesis rates of plant volatiles are highest in young

leaves which are not fully developed and need the most protection (Pichersky, et al., 2006).

Interestingly, it is known that L. hermes lays its eggs on branches underneath new leaf growth. It would seem possible that the biosynthesis of some kaempferol containing compound may be

attracting L. hermes to oviposit and thus provide protection to the plant. In addition, plant volatiles containing an aromatic ring follow the pathway leading from shikimate to phenylalanine then to primary and secondary nonvolatile compounds (Pichersky, et al., 2006).

One such primary compound is lignin, which may be why compound 755 is also positively

correlated with lignin.

Plant volatiles are influenced by environmental factors such as light, temperature and

moisture (Pichersky, et al., 2006). Severe water deficiency has been demonstrated to increase

secondary metabolite production (Yang et al., 2018). The flavonoid concentration in Pisum

sativum was shown to increase by 45% in drought conditions compared to a well-watered control

42 Dubord

(Yang et al., 2018). Plant extracts collected from inside L. hermes range originated from areas lower in precipitation while plants outside contained higher levels of precipitation (Figure 11,

12). Based on this, it would be expected that there would be a higher concentration of

Compounds 741, 755, 317 and kaempferol in range of L. hermes, however, this was not the case.

There was no statistical significance between Compound concentrations in and out of L. hermes

range. When it came to kaempferol, the opposite was observed. A t-test and PCA showed that

kaempferol was more likely to be found outside range in areas with higher precipitation.

Photosynthesis, one of the fundamental processes needed to construct secondary

metabolites, begins with light. According to Yang, et al., light is essential in promoting plant growth as well as inducing or regulating plant metabolism. As a response to their environment, the presence of light allows for plants to biosynthesize secondary metabolites such as phenolic compounds, triterpenoids and flavonoids. For example, after subjected to long light irradiation of

16 hours, leaves of Ipomoea batatas responded in a dramatic increase in such flavonoids as flavanols. In Centella asiatica, flavonoids were positively correlated to growth-lighting conditions and UV-B irradiation was found to have the highest increase in flavonoids such as kaempferol in Populus trichocarpa. These examples all suggest that flavonoids may be used by plants to protect against light exposure. R. crocea plants outside of L. hermes range were found to grow at a statistically higher elevations than those inside the range. UV radiation, especially

UV-B, is increased at higher altitudes (Rana et al., 2020). In addition, increased altitude will result in a decrease in atmospheric temperature which has been shown to increase the production of phenolic compounds (Rana et al., 2020). In order to protect themselves from the influx of UV-

B, the R. crocea outside the range are assumingly synthesizing more kaempferol containing molecules which may be acting as a UV-filter to protect the plant.

43 Dubord

Kaempferol was also positively correlated with total soil N and N deposition. N is important for metabolic processes, which is why plant tissues that have low quantities of N can reduce the quality of plants as food (Fei et al., 2017). N demand and levels exist in the highest concentrations in young growing tissues of plants and decline as the plant ages (Fei et al., 2017).

Flavonoids have been reported to regulate oviposition and feeding (Mierziak et al., 2014).

Compounds 741, 755, 317 and kaempferol are all classified as flavonoids and therefore may influence oviposition. Females typically select plants suitable for oviposition based on visual, olfactory and gustatory information (Fei et al., 2017). The host plant is generally chosen for containing a high concentration of primary metabolites and a low concentration of secondary metabolites (Fei et al., 2017). This is consistent with kaempferol existing in higher concentrations outside the range of L. hermes. The leaves nutritional quality is typically higher when young and developing compared to when it is mature (Fei et al., 2017). This is most likely why L. hermes chooses to lay is eggs under new leaf growth as insects tend to develop healthier and in higher densities on new plant tissues compared to older tissues (Fei et al., 2017).

Secondary metabolites are often responsible for plant defense against herbivores which may deter oviposition and interfere with an insect’s physiology once ingested (Fei et al., 2017).

Previous studies have shown that butterflies require physical contact with the host plant to recognize it as a potential oviposition site (Fei et al., 2017).

L. hermes seem to prefer a range that is lower in precipitation, higher in average and minimum monthly temperature, and also lower in kaempferol concentration. Kaempferol is obviously necessary for the most abundant flavonoids in R. crocea as it makes many of their backbones. What may be confining L. heremes to their small habitat may in fact be kaempferol concentration. In this range there exists the perfect concentration as outside their range there is

44 Dubord

too much. This may be determined by volatile phytochemicals or visual, olfactory and or

gustatory information. The climate also seems to be influencing their sedentary habits as a

majority of the leaf samples from inside the habitat were found in areas with average or

minimum temperature.

The investigation of these ecologically important questions was made possible with LC-

MS/MS. Next steps for this project include running a SIM for 285 m/z to capture cases of

degradation rather than MS processes where there had been sugars in two spots on the

kaempferol molecule. Using a t-test, the kaempferol ion at 287 m/z should be compared to the 22

different kaempferol-containing molecules determined in the plant samples (Figure 9). The glu 1

and glu 2 sugars identified in the previous work will also be identified.

Many insects use phytochemicals for host plant recognition (Fei et al., 2017). Another

interesting experiment would be to test if the kaempferol in the R. crocea leaves can be seen by

UV radiation in differing amounts in and out of range. To do this, leaves would be collected

from one R. crocea plant in range and one out of range. Using reflectance spectroscopy, the leaves would be examined under UV265 nm UV365 nm, the wavelengths the kaempferol molecule

absorbs. Lepidopterans have the capability to sequester flavonoids of which a vast majority end

up in their wings which is presumably used for species identification. Should R. crocea leaves

absorb this frequency, it may be determined to be a visual cue which aids L. hermes in

determining suitable host plants.

ACKNOWLEGEMENTS

This research and writing would not have been possible without the tremendous support

and assistance I received throughout the past two years.

45 Dubord

First, I would like to thank my PI, Dr. Jackie Trischman, whose expertise guided my research questions. Thank you for your unwavering support of my ambitions and mentorship for the past 4.5 years beginning as an undergraduate and now to a graduate student. You have been an enormous inspiration to me and your lab has showed me that there should be no boundaries in

STEM. It has been an honor working in your lab and something I will look back on fondly.

I would like to acknowledge Robyn Araiza for your outstanding mentorship throughout this entire project. Your help and feedback were invaluable and helped me to grow my instrumentation and critical thinking skills. This project would not be what it is today without your help.

I would also like to thank Dr. Vourlitis for beginning such a fascinating project and

Liberty Isbell for providing the research foundation this project was built on. Dr. Vourlitis, thank you for serving on my committee and for such valuable feedback throughout my research and writing process.

In addition, I would like to thank Dr. Schmidt and Dr. Iafe. Dr. Schmidt, thank you for serving on my committee and for your attention to detail throughout my thesis as it has improved my writing and presentation skills. Dr. Iafe, thank you for being such an organized graduate coordinator resulting in my timely conclusion of this program.

Lastly, I would like to acknowledge my parents and boyfriend for your unwavering support and belief in me. To my cohort, thank you for your friendship and being there with me every step of the way.

46 Dubord

REFERENCES

1. Agilent Technologies Agilent 6400 Series Triple Quadrupole LC/MS System Concepts

Guide: The Big Picture; Agilent Technologies: Santa Clara, 2012.

2. Alarcón, J., Cespedes, C.L.; Phytochem Rev, 2015, 14, 389–401.

3. Bruce, T. J. A.; J. Exp. Bot., 2014, 66, 2, 455-465.

4. Calderon-Montano, J. M.; Burgos–Moron, E.; Perez–Guerrero, M.; Lopez–Lazaro, M.;

Mini-Rev. Med. Chem., 2011, 11, 4, 298-344.

5. Deutschman, D. H.; Berres, M. E.; Marschalek, D. A.; Strahm, S. L. Two-year evaluation

of Hermes Copper (Lycaena Hermes); 2011; p. 8.

6. Dewick, P. M.; Medicinal Natural Products: A Biosynthetic Approach, 3rd Edition.; John

Wiley & Sons, Ltd: Chichester, 2009.

7. Emmel T. C., Emmel J. F.; The butterflies of Southern California, Nat Hist Museum Los

Angeles County, 1973.

8. Fei, M.; Harvey, J. A.; Yin, Y.; Gols, R.; J. Chem. Ecol., 2017, 43, 6, 617–629.

9. Garcia-Barros, E.; Fartmann, T. Ecology of Butterflies in Europe; Cambridge University

Press: Cambridge, 2009.

10. Gentili, R.; Ambrosini, R.; Montagnani, C.; Caronni, S.; Citterio, S.; Front. Plant Sci.,

2018, 9, 1335.

11. Hogan, D. Petition to List the Hermes Copper Butterfly (Hermelycaena [Lycaena]

hermes) as Endangered Under the Endangered Species Act; 2006.

12. Isbell, L.; Effects of N and Climate on Hermes Copper Butterfly (Lycaena hermes)

Habitat in Southern California, 2020.

47 Dubord

13. Karkare, P. Principal Component Analysis – A Brief Introduction. Medium

(https://medium.com/x8-the-ai-community/principal-component-analysis-a-brief-

introduction-dc8cf3e03c71) (accessed Mar 13, 2021).

14. Li, S.; Wang, Z.; Miao, Y.; Li, S. J. Integr. Agric., 2014, 13, 10, 2061-2080.

15. Mai, L. P.; Gueritte, F.; Dumontet, V.; Tri, M. V.; Hill, B.; Thoison, O.; Guenard, D.;

Sevenet, T.; J. Nat. Prod., 2001, 64, 1162-1168.

16. Marschalek, D. A.; Deutschman, D. H.; J. Insect Conserv., 2008, 12, 97-105.

17. Marschalek, D. A.; Deutschman, D. H.; Strahm, S.; Berres, M. E.; Ecol. Entomol., 2016,

41, 327–337.

18. Mierziak, J.; Kostyn, K.; Kulma, A.; Molecules, 2014, 19, 10, 16240-16265.

19. Montalvo, A.; Riordan, E.; Beyers, J; Gellie, N., Plant Profile for Rhamnus crocea and

Rhamnus ilicifolia, 2020.

20. Pichersky, E.; Noel, J. P.; Dudareva, N.; PMC, 2006, 311, 5762, 808-811.

21. Powell, V. Principal Component Analysis. Setosa.io (https://setosa.io/ev/principal-

component-analysis/) (accessed Mar 13, 2021).

22. Ramakrishna, A.; Ravishankar, G. A.; Plant Signal., 2011, 6, 11, 1720-1231.

23. Rana, P.S.; Saklani, P.; Chandel, C.; Res. J. Med. Plant, 2020, 14, 43-52.

24. Reisenman, C. E.; Riffell, J. A.; Bernays, E. A.; Hildebrand, J. G.; J. Biol. Sci., 2010,

277, 1692, 2371-2379.

25. Rowell, R. M.; Han, J. S.; Rowell, J. S.; Natural Polymers and Agrofibers Composites,

2000, 115-134.

26. Samanta, A.; Das, S. K.; Das, G.; Int. J. Pharm. Sci. Tech., 2011, 6, 1, 12-35.

48 Dubord

27. Staff of Carlsbad Fish and Wildlife Service, Federal Register Volume 71, Number 152,

2006.

28. Stevens, C. J.; David, T. I.; Storkey, J. Funct. Ecol., 2018, 32, 7, 1757-1769.

29. Thorne, F.; J. Res. Lepid., 1963, 2, 143-150.

30. Vourlitis, G; Hermes Copper Butterfly Chemical Ecology Study (US Fish and Wildlife

Service Grant F17AC00960); 2018.

31. Wiesen, B., Krug, E., Fiedler, K.; J. Chem. Ecol., 1994, 20, 2523–2538.

32. Yang, L.; Wen, K.; Ruan, X.; Zhao, Y.; Wei, F.; Wang, Q.; Molecules, 2018, 23, 4, 76.

33. Zhang, J.; He, N.; Liu, C.; Xu, L.; Chen, Z.; Li, Y.; Wang, R.; Yu, G.; Sun, W.; Xiao, C.;

Chen, H. Y. H.; Reich, P. B.; Glob. Change Biol., 2019, 26, 4, 2534-2543.

34. Zhijing, X.; Shaoshan, A.; MDPI, 2018, 10, 4757.

49 Dubord

APPENDICIES

Figure 18. Maps of Southern California generated through Google Earth with pinned study site areas where samples of R. crocea inside (A.) and outside (B.) L. hermes habitat were collected

(Vourlitis, 2018).

50 Dubord

Table 9. Sites names, latitude, longitude, elevation and estimated N deposition for R. crocea study sites in and out of L. hermes range (Isbell, 2020).

Inside L. hermes habitat Est. N Deposition Site Latitude Longitude Elevation (m) (kg N ha-1yr-1) Elfin Forest (EF) 33.074 -117.157 138 11.73 Black Mountain (BM) 32.977 -117.123 281 12.38 Meadowbrook (MB) 32.964 -117.069 187 12.13 Mission Trails (MT) 32.833 -117.038 144 10.77 McGinty Peak (MP) 32.758 -116.851 352 9.87 mg20763 33.063 -117.083 122 11.91 cbo86186 33.025 -117.171 49 12.13 SD195367 32.938 -117.213 49 12.83 cbo29765 32.938 -117.134 72 12.23 oe3104 32.951 -117.017 210 10.43 SD211218 32.869 -116.968 152 10.94 SD208086 32.711 -117.079 29 9.79 cbo37640 32.925 -117.162 109 11.5 SD182404 33.044 -117.153 49 11.73 in:9513993 32.832 -117.104 95 12.15 Outside L. hermes habitat Est. N Deposition Site Latitude Longitude Elevation (m) (kg N ha-1yr-1) UCR102732 33.781 -117.056 610 10.84 in:8546144 33.807 -117.354 595 10.45 UCR241774 33.725 -117.392 378 8.37 UCR100298 33.8 -117.061 610 11.2 UCR260565 33.641 -117.226 537 9.47 UCR249823 33.598 -117.142 402 11.19 cbo43336 33.308 -117.232 129 13.36 cbo43271 33.315 -117.234 80 13.36 cbo53916 33.366 -117.153 224 12.34 SD201912 33.168 -117.094 232 13.33 cbo73769 33.171 -117.275 96 12.93 UCR270175 33.466 -117.042 488 10.08 UCR1131 33.386 -116.79 846 6.32 SD163649 33.259 -117.141 N/A 12.48 cbo76139 33.093 -117.298 28 10.44

51