<<

FOLIAR ENDOPHYTIC FUNGI

OF THE NATIVE HAWAIIAN

______

A University Thesis Presented to the Faculty

of

California State University, East Bay

______

In Partial Ful(llment

of the Requirements for the Degree

M.S. in Biological Sciences

______

By

Sean Omea Imanishi Swift

December 2016 Copyright © 2016 by Sean Swift

ii Abstract

Foliar endophytic fungi (FEF) have been found living asymptomatically within the leaf tissues of all land sampled thus far. Hawaii presents a unique landscape for examining the ecology and evolution of these cryptic fungal symbionts. The isolation of the Hawaiian archipelago provides a strong barrier to colonization by host plants and their associated endophytes. The native plant genus Scaevola () is the result of three separate colonization events and has adapted to a variety of habitats, including coastal strand, rain forest, and exposed lava :ow. This project focused on quantitatively assessing FEF diversity and community structure in Scaevola through a combination of culture based methods and high-throughput environmental sequencing. Leaf samples were collected from 35 individuals of Scaevola representing 8 from three islands. Cultured endophytes were grouped into Molecular Operational Taxonomic Units (OTUs) based on 97% similarity of the nuclear ribosomal internal transcribed spacer region (ITS). Sequences of the nuclear large ribosomal subunit (LSU) were generated from isolates representing each OTU. Phylogenies were constructed using LSU sequence data and annotated with ecological data derived from Illumina® sequencing. Endophyte community composition was compared on the basis of host genetics (e.g. host species & host lineage) and abiotic environmental factors (e.g. elevation & mean annual temperature).

iii

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant No. 1360626. Thank you to all the people and organizations that helped with our (eldwork in Hawai'i, including the National

Tropical Botanic Garden, The Nature Conservancy, Plant Extinction Prevention Program, and the Department of Land and Natural Resources. I want to thank a great many people for helping me get to this point. My parents Brian and Yuri and my sister Erin for their constant support and encouragement. Dr. Brian Perry, for freely sharing his wisdom on fungi and all other subjects. I could not have asked for a more intelligent and generous mentor. Thank you to my committee members Dr. Maria Gallegos and Dr. Christopher Baysdorfer for their careful reading and helpful comments. For their assistance with this project, I must thank Dr. Anthony Amend, Gerald Cobian, Dr. Geoffrey Zahn, Erin Datlof, Don Hemmes, and Alex Danza. I am also indebted to the many botanists who helped us along the way, including Steve Perlman, Tim Flynn, Kristen Coelho, Adam Williams, Seana Walsh, and Jesse Adams. Members of the Perry lab Devin Schaefferkoetter and Jonathan del Rosario provided labor, tolerance, and a great deal of kindness, all of which I greatly appreciate. Lynx Gallagher, my erstwhile mentor in Hawaiian mycology and bad behavior is acknowledged for his crucial role in my graduate education. I would also like to recognize Dr. Donald Roeder for introducing me to mycology and setting me on this unorthodox path. Finally, for her endless patience, love, and help in the lab, a heartfelt thank you to

Patricia Sendão, whom it is impossible to adequately thank.

v Table of Contents

Abstract...... iii

Acknowledgements...... v

List of Figures...... viii

List of Tables...... xi

Introduction...... 1 Nature of The Association...... 1 Environmental Factors...... 2 Host Genotype...... 4 Evolution And Coevolution...... 5 Implications for Scaevola...... 6 Speci(c Research Questions...... 9 Hawaiian Endophytes...... 12 Methodological Precedence...... 13

Methods...... 14 Sample Collection And Processing...... 14 Fungal Culturing...... 16 DNA Extraction, Ampli(cation, and Sequencing...... 17 Environmental PCR and Illumina® Sequencing...... 19 Analysis...... 21

Results...... 23 Summary Data from Cultured Isolates...... 23 Phylogenetics...... 27 Summary Data from Illumina® Sequencing...... 31 Diversity of OTUs from Illumina® Sequencing...... 35 Sample Dissimilarity...... 41 Combining Cultured Isolate Phylogenies and Illumina® Sequencing Data...... 48

Discussion...... 53 Phylogenetic Placement and Evolutionary Hypotheses...... 56 The Dominant Genus Colletotrichum...... 57 Community Ecology...... 60 Synthesis of Culture and Environmental Sequencing Data...... 64 Future Directions for Scaevola...... 66

References...... 68

vi Appendix 1: Sample Metadata...... 78

Appendix 2: Bioinformatics Pipeline for Processing Illumina Data...... 84

Appendix 3: Data Analysis in R...... 92

vii List of Figures

Figure 1. A map showing sample sites for Hawaiian Scaevola species 9 used in this project. Lines indicate regions with equal annual rainfall totals ranging from 250 mm/year at low elevation sites to 4000 mm/year at higher elevation sites. Lines indicate increments of 500 mm/year rainfall totals (rainfall data from Frazier, Giambelluca, Diaz, & Needham, 2015). Figure 2. Illustration of ribosomal cassette showing conserved primer 19 locations and approximate amplicon lengths for two relevant gene regions: nuclear ribosomal Internal Transcribed Spacer (ITS), and nuclear ribosomal Large Sub-Unit (LSU). Figure 3. Graph showing distinct OTUs isolated from each species. Each 25 bar is colored by QIIME assigned of OTUs at the class level. Figure 4. Number of Scaevola individuals each OTU was isolated from 26 (total individuals successfully isolated from = 23). Most OTUs (29) were encountered a single time. The most common and widespread OTU (EU552111) was isolated from 11 individuals. Figure 5. A maximum likelihood tree of all fungal isolate OTUs shown in 38 the context of . Enlarged phylogeny on the right shows placement of isolates within . Isolates are colored by genus level QIIME assigned taxonomy. Branch labels indicate bootstrap support from 100 bootstrap replicates. Figure 6. Enlarged phylogeny on the right shows placement of isolates 30 within Dothideomycetes. Isolates are colored by genus level QIIME assigned taxonomy. Branch labels indicate bootstrap support from 100 bootstrap replicates. Figure 7. A maximum likelihood tree of isolates assigned to order 31 Glomerellales. Isolates are colored by species level taxonomy assigned by QIIME. Branch labels indicate bootstrap support from 100 bootstrap replicates. Figure 8. Distinct OTUs recovered from each species through 33 environmental PCR and Illumina® sequencing. Dominant classes of fungal endophytes included Dothideomycetes and Sardariomycetes. Many OTUs remained unidenti(ed at the class level.

viii Figure 9. Histogram showing the number of plant individuals each OTU 34 was recovered from. The vast majority of OTUs were not encountered in more than one individual. 826 OTUs were encountered a single time while 54 OTUs were encountered in more than 3 times. Figure 10. A Plot showing Tukey HSD of Shannon diversity of endophytic 36 community for samples from the different host lineages. There were no signi(cant differences in the mean per-sample diversity of endophytic communities between the three host lineages. Figure 11. OTU rarefaction curves by sample. Sequencing depth and OTU 37 sampling effectiveness varied by sample. OTU accumulation did not plateau for any sample. Figure 12. OTU rarefaction curves by host species. Sequencing depth and 38 OTU sampling effectiveness varied by species. OTU accumulation did not plateau for any of the sampled host species. Figure 13. OTU rarefaction curves by host lineage. Lineage A was 39 comprised of S. glabra. Lineage B was comprised of S. taccada. Lineage C was comprised of S. mollis, S. gaudichaudii, S. gaudichaudiana, S. procera, and S. chamissoniana. Though Lineage C had the most sampling effort and highest number of OTUs, accumulation of OTUs did not plateau. Figure 14. OTU rarefaction curve across all samples. Total sampling effort 40 did not adequately recover the total endophytic community present in sampled Scaevola species. Figure 15. Non-metric multi-dimensional scaling (NMDS) visualization of 42 Bray-Curtis dissimilarity of endophytic communities between samples. Samples are colored by host species and island of origin. Additional plot information is available in Appendix 3.

Figure 16. NMDS plot of Bray-Curtis dissimilarity of endophyte 43 communities between samples. Samples are colored by host plant lineage with a shaded ellipse showing the 95% con(dence interval for placement of samples in Lineage C. Additional plot information is available in Appendix 3. Figure 17. NMDS plot of Bray-Curtis dissimilarity of endophyte 44 communities between samples. Samples are colored by Mean Annual Temperature (TempFactor) at the collection site. Shaded ellipses show 95% con(dence interval for placement of samples at different levels of MAT. Additional plot information is available in Appendix 3.

ix Figure 18. Three-dimensional visualization of redundancy analysis (RDA) 45 of Bray-Curtis dissimilarity constrained by mean annual temperature (temp) and host lineage. Additional plot information shown in Appendix 3.

Figure 19. Three-dimensional visualization of RDA of Bray Curtis 46 dissimilarity constrained by leaf phosphorous mass (Pmass), mean annual temperature (temp), and freely ending veinlet density (FEVdensity). Host lineage boundaries are visualized, but not included as a variable in the RDA. Additional plot information is available in Appendix 3. Figure 20. Plot of Tukey’s Honestly Signi(cant Difference Test performed 47 on multivariate dispersion between lineages. Dispersion differed signi(cantly between Lineage C and the other two Lineages. Dispersion did not differ signi(cantly between Lineage A and Lineage B. Additional plot information is available in Appendix 3. Figure 21. Host species distribution of cultured endophytic fungi based on 49 Illumina® sequencing data. Of 99 sequenced isolates, 21 were successfully probed for in the Illumina® data. Isolate K.055.A3.1 represents the least host speci(c OTU and was found across all seven sampled species of Scaevola.

Figure 22. Maximum likelihood tree of 46 cultured OTUs using LSU. Tip 50 labels are colored based on their presence or absence in the Illumina® data. Columns show presence or absence of cultured endophyte OTUs in the next gen sequencing data for sampled species of Scaevola. Branch labels indicate bootstrap support based on 100 bootstrap replicates. Figure 23. Maximum likelihood tree of 46 cultured OTUs using LSU data. 51 Tip labels are colored based on their presence or absence in the Illumina® data. Columns show presence or absence of cultured endophyte OTUs across islands.

Figure 24. Maximum likelihood tree of 46 cultured OTUs using LSU data. 52 Tip labels are colored based on their presence or absence in the Illumina® data. Columns show presence or absence of cultured endophyte OTUs across temperature ranges (ºC).

x List of Tables

Table 1. PCR parameters for ITS ampli(cation. 18

Table 2. List of the most frequently isolated endophytic species. Species 28 level taxonomy assigned to each OTU by QIIME. Of 27 distinct species assignments, 13 were encountered more than once.

xi 1

Introduction

Foliar endophytic fungi are a hyper-diverse and ecologically complex group of organisms that live symbiotically within the leaf tissues of plants. A single asymptomatic leaf can contain numerous fungal taxa with distinct evolutionary histories and ecological niches. Only recently have advances in high-throughput sequencing technologies and access to fungal sequence databases allowed for a thorough investigation of these minute foliar landscapes. This project utilized traditional culture based methods and next generation sequencing to examine foliar endophytic fungi in the native Hawaiian plant genus Scaevola. The unique distribution and evolution of Scaevola allowed for the elucidation of the relative in:uence of host genetics and abiotic environmental factors on endophyte community structure and evolutionary history.

Nature of The Association

One of the recent pioneers of the (eld, Elizabeth Arnold (2007), argues

"There is no better time to be an endophyte biologist." Fungal endophytes appear to be diverse, widespread, and ubiquitous. As such, they are the subjects of a rapidly growing body of research. Foliar endophytic fungi present a unique form of fungal symbiosis in that they are highly localized within the foliar tissue, transient in association, and are often host generalists 2

(Saikkonen, et al., 1998). The current consensus in the (eld is that foliar fungal endophytes are present in most healthy foliar tissue. The presence or absence of endophytic fungi can modulate disease severity, physiology, biochemistry, and prevalence of herbivory in host plants (Arnold & Engelbrecht, 2007; Busby et al.,

2013; Rivera-Orduña et al. 2011; Van Bael et al., 2009). They are horizontally transmitted and vary in dispersal from rare to globally cosmopolitan. Davis et al.

(2003) showed that endophytes of the genus Xylaria identi(ed in liverworts were closely related both to each other and to endophytes collected from angiosperms in China, Puerto Rico, and Europe. Endophytic fungi have been isolated in extreme environments including the Sonoran desert and Antarctica (Massimo et al., 2015; Rosa, Almeida Vieira, Santiago, & Rosa, 2010). It is perhaps an understatement to say that fungal endophytes do not constitute a monophyletic group. Endophytic taxa fall out in a variety of distantly related fungal families and are present in two distinct phyla: Ascomycota and Basidiomycota. Perplexingly, they are often closely related to well known pathogenic and saprotrophic taxa (Arnold & Lutzoni, 2007; Oono, Lefévre, Simha, & Lutzoni, 2015; U’ren et al.,

2016). Other than their lifestyle (i.e. inhabiting plant tissue without causing symptoms of disease), there are few unifying characteristics of foliar endophytic fungi.

Environmental Factors

The relationship between geography and endophyte diversity has been a common theme in endophyte research during the past decade. Investigations up 3 to this point have assessed variation in endophyte diversity and community structure along gradients of temperature, elevation, and geographic distance.

The scale of these studies has varied wildly. Hashizume et al. (2008) looked at the distribution of endophytic fungi inhabiting Quercus along elevational gradients on two mountains in Japan. Endophyte distribution was strati(ed based on both temperature and altitude at a relatively local scale. In a similar vein, researchers in Hawaii showed that among site variation in endophyte communities of Metrosideros polymorpha correlated strongly with rainfall and temperature along an elevational gradient on Hawaii Island (Zimmerman & Vitousek, 2012). A much larger study showed that only 6% of endophyte genotypes were shared between similar host plants in boreal and arctic communities (Higgins, Arnold,

Miadlikowska, Sarvate, & Lutzoni, 2007). This work indicates that broad-scale distribution patterns are also linked to climate. It would appear that climate, at the local and global scale, has a strong effect on which fungi will be present as endophytes. However, there are still ambiguities in this area of study, and many questions remain unanswered. What is the mechanism by which these environmental factors are affecting endophyte distribution? Are they presenting a direct barrier to dispersal and growth of the fungi or are they linked with changes in the host plant (e.g. leaf morphology) that favor different endophytic species? 4

Host Genotype

It has proven dif(cult to disentangle the relative effects of host genotype and environmental factors on endophyte community structure. Studies looking at the effect of host genotype on endophyte communities have yielded con:icting results. One study indicated that evolutionary similarity in Quercus species did not indicate similarity in endophyte communities (M. Hoffman, Gunatilaka, Ong,

Shimabukuro, & Arnold, 2008). However, the results did indicate that endophytes showed host speci(city at the genus level. A separate study showed that both geographic location and host plant species play a role in determining endophyte communities in Cupressaceae (M. T. Hoffman & Arnold, 2008). The authors did not test whether evolutionary similarity of the host species played a role, but they showed endophyte communities were signi(cantly different between species. A study on foliar endophytes of Podocarpaceae in New Zealand compared the relative effects of host species and geographic separation (Joshee, Paulus, Park,

& Johnston, 2009). The authors concluded that host species was the greatest determinant of endophyte communities while geographic separation played a signi(cant but lesser role. Finally, it was shown that host plant taxonomy together with speci(c host functional traits and leaf morphology were signi(cantly associated with phyllosphere fungal community structure (Kembel & Mueller,

2014). Though phyllosphere fungal communities include leaf surface fungi, leaf traits are likely crucial to understanding the community composition of endophytic fungi. In summary, each study identi(ed a different factor as the best predictor of endophyte community composition. Though somewhat confounding, this is not 5 entirely surprising given that the studies used different methods, different host plants, and were conducted in a different geographic regions. A more systematic attempt to understand the impact of host genotype, as it pertained to plant pathogens, was made by Busby et al. (2014). The researchers collected data on plant pathogen communities of Populus augustifolia grown in common gardens along elevational and distance gradients. They combined this data with a greenhouse inoculation experiment and data from wild individuals. The authors concluded that host genotype likely plays a stronger role in shaping plant pathogen communities locally (e.g. in a common garden) while environmental conditions play a greater role at the geographic scale (Busby et al., 2014).This result should be taken cautiously, as plant pathogenic fungi are likely under unique selection pressures compared to endophytes. However, many endophytic fungi are closely related to plant pathogens and could share similar dispersal patterns. The relative importance of host speci(city and dispersal limitation for fungal endophytes remains an open question.

Evolution And Coevolution

Obvious coevolution between lineages of foliar endophytic fungi and their host plants does not appear to be occurring. One possible explanation is that foliar endophytic fungi are not wholly dependent on their hosts. At least some endophytic taxa appear to switch readily between endophytic and saprotrophic life histories. Recent phylogenetic work by U'ren et al. (2016) on endophytic 6

Xylariaceae in North America indicated that many fungi isolated as foliar endophytes are closely related to fungi found living on other substrates, including dead plant material and termite mounds. Endophytism is potentially a transient, but advantageous, stage of life. In one study, Xylaria fruiting bodies growing as decomposers on the forest :oor were shown to be of the same species (as determined by genetic similarity) as symbiotic Xylaria living as endophytes in the same forest plot (Thomas, Vandegrift, Ludden, Carroll, & Roy, 2016). While

Xylaria individuals isolated as endophytes showed no sensitivity to local environmental conditions, the individuals isolated as decomposers appeared to be constrained by certain factors, including proximity to water. At least in the Xylariaceae, the evidence points to endophytes easily associating and dissociating with their host plants. This concept has lead to the "Foraging Ascomycete Hypothesis," which views endophytism as a means for fungi to avoid sub-optimal environmental conditions and survive long enough to locate preferred substrates (Thomas et al., 2016).

Implications for Scaevola

In general, it was dif(cult to make con(dent predictions about Scaevola's endophytic communities due to the lack of existing data on tropical endophytes. All of the environmental and genetic factors listed above are potential avenues of analysis for the data this project produced. As mentioned above, it has been established that both geographic separation and host species can affect fungal endophyte community structure. However, host plant similarity within a single 7 genus has not proven to be a good predictor of similarity in endophyte communities. Given that some Hawaiian Scaevola species have highly speci(c, non-overlapping ranges, it seems most likely that the impact of host genotype will be impossible to disentangle from environmental conditions and geographic distance between collection sites. That said, Scaevola presents some unique avenues for inquiry. Unlike many of Hawaii's endemic plant genera, Scaevola is the product of three separate introductions, each of which could harbor unique fungal endophytes (Howarth & Baum, 2005). Additionally, Scaevola species inhabit a wide variety of habitats and display a range of leaf morphologies. This is fairly typical of Hawaiian genera, which are characterized by rapid evolutionary radiations resulting from infrequent colonization events and new island formation

(Funk & Wagner, 1995). Scaevola exhibit leaf morphology ranging from long, thin leaves on mountain species to small, succulent-like leaves adapted to dry coastal areas (McKown, Akamine, & Sack, 2016). Given the unique evolutionary history of Scaevola in Hawaii, it was possible that the endophyte communities of this genus would have similarly unique patterns of distribution. Finally, we expected that distinct endophytic taxa would be isolated from each island. At opposite ends of the archipelago, Kauai and the Big Island are distinctly different landscapes. From a geological perspective, Kauai was formed around (ve million years ago while the Big Island is much younger, at about half a million years old. They are separated by several islands and deep-water channels, which present signi(cant barriers to dispersal by host plants and, possibly, fungi.

Another possible (nding would be plant traits and host taxonomy driving fungal phyllosphere communities (Kembel & Mueller, 2014). Recent work by 8

McKown et al. (2016) highlighted the convergent evolution of traits in Hawaiian

Scaevola adapted to similar habitats. Distantly related Scaevola species inhabiting similar environments (as delineated by elevation, MAT, Soil organic matter, etc.) had similar leaf traits. Given that leaf traits de(ne the local environment for fungal endophytes, it is possible that endophyte communities may vary along with host leaf traits and habitat ranges. For this study, sampling was conducted across the Hawaiian archipelago in order to identify both host trait and environmental factors in:uencing endophytic communities. Leaf samples were taken from wild populations of 8 species of native Scaevola on Kauai, Oahu, and Hawaii, representing a range of habitats, leaf traits, and evolutionary histories (Figure 1). 9

Figure 1. A map showing sample sites for Hawaiian Scaevola species used in this project. Lines indicate regions with equal annual rainfall totals ranging from 250 mm/year at low elevation sites to 4000 mm/year at higher elevation sites. Lines indicate increments of 500 mm/year rainfall totals (rainfall data from Frazier, Giambelluca, Diaz, & Needham, 2015).

Speci(c Research Questions

One goal of this project is to assess whether the dispersal history of the host plant is correlated with its endophytic community. The Hawaiian archipelago is one of the most isolated island chains in the world. The endophytic fungi present in Scaevola were either dispersed along with their host plant or formed an association once the plant had established itself in Hawaii. Either scenario 10 has the potential to yield unique endophyte communities for the different introductions. Although many endemic Hawaiian plant genera are the result of a single introduction followed by evolutionary radiation, members of the genus Scaevola were introduced on multiple occasions.

A phylogenetic study by Howarth et al. (2003) showed that Scaevola in

Hawaii are likely the result of dispersal events from , , and possibly the Americas. The (rst two introductions accounted for a single species each, while the third resulted in an evolutionary radiation of 8 species. (Lineage B) is a widespread paci(c species and by far the most common Scaevola species in Hawaii. (Lineage A) represents a second introduction and is the sole tetraploid species known in the genus. The remaining members of the genus arose from a Hawaiian radiation (Lineage C) that is a sister clade to , a widespread species found primarily along the coasts of Africa and the Americas (Howarth et al., 2003). One hypothesis of this project was that endophyte communities would be structured by the evolutionary lineage of their host plants. Further investigation into the relationship between endophyte communities and their host plant’s relatedness was made possible by a multi-gene phylogeny of Scaevola conducted by Howarth et al. (2005). They showed that the Hawaiian radiation of Scaevola consists of two distinct clades. One clade is comprised of two species speci(cally adapted to dry, arid environments. The second clade is comprised of three species found exclusively in wet forests. The two clades exhibit widely divergent phenotypes and have no overlapping ranges. In addition to these distinct lineages, there are two putative hybrid species, Scaevola procera and Scaevola kilueae. The hybrid species are described as having one 11 ancestor each from the two clades. This system made it convenient to test whether endophyte communities were more similar within the dry and wet clades or if the host plant clade has no effect on endophyte community structure. However, the relationships between Scaevola species suggested by Howarth et al. (2005) are somewhat speculative. The phylogenies were constructed using 4 genes, all of which had minimal genetic variation between species. Additionally, there was incongruence in the phylogenies produced by the separate genes. Though it was suggested this was due to homoploid hybridization events, there are alternative explanations, such as incomplete lineage sorting, that were impossible to rule out.

The evolutionary hypotheses proposed by Howarth et al. (2005) were corroborated, to some extent by McKown et al. (2016) using leaf trait data.

Though some leaf traits were signi(cantly correlated with the evolutionary lineages, several leaf traits were strongly linked to environmental factors regardless of lineage. Given that leaf traits play a large part in determining the local habitat for foliar endophytes, we hypothesized that endophyte communities would be signi(cantly dissimilar both between lineages and across environmental gradients. 12

Hawaiian Endophytes

In general, the endophytic communities of Hawaiian plants should be considered understudied. Investigation into the endophytes of native Hawaiian plants has thus far been limited to a single environmental sequencing study focusing on a single host species (Metrosideros polymorpha) on the island of

Hawaii (Zimmerman & Vitousek, 2012). Research into the Hawaiian phyllosphere has become increasingly relevant with the sudden emergence of a fungal pathogen causing Rapid Ohia Death and recent (ndings showing that endangered Hawaiian tree snails have distinct dietary preferences for epiphytic fungi (O’Rorke, Holland, Cobian, Gaughen, & Amend, 2016).

The majority of studies on endophytic fungi have focused on easily accessible temperate regions and as a result, research on tropical endophytes has lagged behind. Tropical studies have focused on the economically important crops Theobroma cacao and Coffea Arabica (Mejía et al., 2014; Saucedo-García,

Anaya, Espinosa-García, & González, 2014; Vega et al., 2010). The most recent effort to understand endophytic communities in native tropical habitats was conducted by Vincent et al. (2016) in contiguous native forest in Papua New Guinea. Thus far there have been no substantial efforts to examine endophyte diversity in the genus Scaevola or in any members of the family Goodeniaceae. As part of a larger grant to estimate endophyte diversity across several hundred species of Hawaiian dicots, this project contributed to our understanding of Hawaii’s microbial ecology and evaluated the potential for future research focused on Hawaiian Scaevola. 13

Methodological Precedence

Methodologically, this project built on existing studies through a synthesis of culture based methods and next generation sequencing. Culture based studies have the advantage of de(nitively isolating and preserving individual endophytic fungi. Numerous culture based studies have been conducted in the past several years (Arnold et al., 2009; Gazis, Miadlikowska, Lutzoni, Arnold, & Chaverri, 2012; U’Ren, Lutzoni, Miadlikowska, Laetsch, & Arnold, 2012; U’ren et al., 2016;

Vincent, Weiblen, & May, 2016). Methods for culturing are largely uniform, with only slight variation in growth media. Previous studies selected either potato dextrose agar or malt extract agar, though both are considered generic, broad- spectrum media. In terms of analysis, there are a variety of approaches for analyzing evolutionary and ecological data derived from culture libraries. Previous efforts opted to group isolates into operational taxonomic units (OTUs) by morphotype and/or ITS DNA sequence similarity in order to estimate something akin to species level diversity and distribution. Typically, additional sequencing of conserved genetic regions was used to place OTUs in a phylogenetic context (e.g. U’ren et al., 2016). Though popular, culture based methods are labor intensive and inevitably miss a large portion of fungal diversity. Many fungi simply do not grow in culture or only grow under very speci(c environmental conditions. Even for culturable endophytes, growth rates can vary widely causing some taxa to dominate initial isolations from plant material. The end result is that many endophytic taxa present in the leaf tissues are not recovered through culturing. 14

Next generation sequencing of DNA from environmental samples allows for a more comprehensive assessment of endophyte diversity in comparison to culture-based methods. Previous studies have used Roche® 454 pyrosequencing, Illumina®, and Ion Torrent® platforms to conduct high throughput sequencing of both endophytic and soil dwelling fungi (Jumpponen & Jones, 2010; Oono et al., 2015; Schmidt et al., 2013; U’Ren et al., 2014;

Vandruff, 2014; Zimmerman & Vitousek, 2012). For this project, direct ampli(cation and Illumina® sequencing of the fungal ITS region allowed for robust comparison of endophytic communities between host individuals.

Methods

Sample Collection And Processing

Sample collection and fungal isolate culturing followed previously described methods with some slight modi(cations (M. T. Hoffman & Arnold, 2008).

Similar culturing methods were successfully employed in several previous studies (M. T. Hoffman & Arnold, 2008; Massimo et al., 2015; U’Ren, Lutzoni,

Miadlikowska, & Arnold, 2010). Sample collection required harvesting from wild populations of Scaevola, rather than individuals in cultivation, to ensure that natural endophytic communities would be recovered. Approximately three healthy, asymptomatic leaves were collected from each plant. When possible, 15 mature, shaded leaves were selected to increase the odds of recovering viable endophytic fungi. Leaves were cut at their base with sterilized hand shears. The number of leaves collected was varied, if necessary, to account for variance in leaf size. Leaves were stored in separate re-sealable plastic bags in a cooler during (eldwork. Samples were then transferred to a 4º C refrigerator prior to surface sterilization. Surface sterilization was conducted within 48 hours of leaf collection and was completed in batches. Stems were removed from all leaves prior to sterilization. Leaves from a single individual were cut to an appropriate size, placed in labeled tea bags and surface sterilized simultaneously in a series of large bowls or Ziplock® bags. Each set of samples was soaked and gently agitated for 1 minute in 1% bleach, 2 minutes in 70% ethanol, and (nally washed for 2 minutes in ultrapure water or distilled water when ultrapure water was unavailable. For live culturing, leaves were removed from the tea bags, placed on a clean surface using sterile forceps, and inserted into separate 50 ml tubes of ultrapure or distilled water for storage during travel. Additional leaf material was placed in replicate tubes containing CTAB buffer for eventual DNA extraction and environmental PCR. Suf(cient leaf material was placed in each tube such that 24 quarter-inch diameter subsamples could be derived from each sample. All tubes containing surface sterilized samples were stored at 4º C until they could be processed for culturing or DNA extraction. Prior to storage, sample tubes containing CTAB were incubated overnight at room temperature 16

Fungal Culturing

Leaf samples were removed from 50 ml storage tubes and placed in sterile petri dishes for subsampling. Each sample, representing an individual host plant, was subsampled 24 times using a sterilized 1/4" or 1/8” hole punch, depending on leaf size. In the case of smaller leaves or lea:ets, subsamples were taken by cutting leaf material using a sterilized razor blade. All subsamples were taken from asymptomatic tissue, avoiding the leaf edge and vascular tissue, including the midrib and any large veins. Leaf subsamples were cultured on Malt Extract Agar (MEA) containing 30 g malt extract, 5 g peptone, 0.10 g chloramphenicol, and 15 g of agarose per liter. For initial culturing, 1ml of MEA was pipetted into each well of a 24-well tissue culture plate. A single subsample of leaf material was placed in each well in order to prevent cross contamination between subsamples. Fungi growing out of the subsampled leaf tissue were then isolated and grown in axenic culture on 60 mm x 15 mm petri dishes, also containing MEA. Each isolate was then photographed, prior to DNA extraction, and prepared for long-term storage. In a laminar :ow hood, small plugs of agar covered in hyphal tissue were taken from each cultured isolate using sterile transfer tubes. A total of 5 plugs from each isolate were stored in 2 ml cryovials containing 1 ml of autoclaved ultrapure water. The cryovials were then stored at 4º C and periodically re-grown on MEA to assess culture health. 17

DNA Extraction, Ampli(cation, and Sequencing

DNA extraction followed a standard method with no additional post- extraction (ltration steps. An extraction solution was prepared containing 10 ml of 1 M Tris stock, 1.86 g KCl, 0.37 g EDTA, and 80 ml ultrapure water. This solution was then titrated to a pH of 9.5 - 10.0 using NaOH and then diluted up to 100 ml using ultrapure water. Once prepared, the extraction solution was (lter sterilized and transferred to 2 ml centrifuge tubes. Dilution solution was prepared using 3 g of Bovine Serum Albumin (BSA) dissolved in 100 ml of ultrapure water. This solution was similarly (lter sterilized and transferred to 2 ml centrifuge tubes. Extractions were carried out in a laminar :ow hood using sterilized 8-well strip tubes. For each isolate, a small amount of mycelium was removed, using sterilized forceps, and added to 20 µl of extraction solution. Extractions were incubated at room temperature for 10 minutes, followed by an additional incubation at 95º C for 10 minutes. Once incubation was complete, 20 µl of dilution solution were added to each extraction tube. Diluted, extractions were stored at -20º C. For DNA ampli(cation, 1:10 dilutions were prepared from the raw extracts described above. Dilutions were prepared in standard 96 well plates using 2 µl of raw extract and 18 µl of ultrapure water. All 1:10 dilutions were stored at -20º C. Ampli(cation of the fungal genes was carried out in a 96 well format in 25 µl reactions. Reaction mixtures contained 14.0 µl water, 2.5 µl 10x buffer, 2.0 µl dNTP, 1.25 µl of each primer, 0.125 µl taq, and 4.0 µl DNA template. The ribosomal Internal Transcribed Spacer (ITS) region was ampli(ed and sequenced using the ITS1-F forward primer (CTTGGTCATTTAGAGGAAGTAA) and the ITS4 18 reverse primer (TCCTCCGCTTATTGATATGC) (Gardes & Bruns, 1993; White,

Bruns, Lee, Taylor, & others, 1990). The fungal large subunit region (LSU) was ampli(ed and sequenced using the primers LROR (ACCCGCTGAACTTAAGC) and LR7 (TACTACCACCAAGATCT) or LR5 (TCCTGAGGGAAACTTCG)

(Vilgalys & Hester, 1990). Parameters for PCR ampli(cation followed a standard program (Table 1). For LSU ampli(cation, annealing temperature was reduced to 55º C.

Table 1. PCR parameters for ITS ampli(cation. Temperature Stage Cycles (ºC) Time 1 1 95 4:00 2 35 95 0:30 58 0:30 72 1:20 3 1 72 7:00 4 1 4 ∞

In order to verify ITS ampli(cation, PCR products were taken from at least 12 wells from each plate and run on 1% agarose gels. Gels were stained with GelRed (Biotium®) and loaded with 3 µl of PCR product mixed with 2 µl of Blue/Orange (Promega®) loading dye for each sample. Gels were run at 80 volts for 45 minutes and visualized using a Bio Rad ChemiDoc XRS+ with Image Lab Software (BioRad®). Ampli(cation success was assessed based on band intensity and amplicon length. PCR products were then sent to Genewiz® for clean-up and cycle sequencing. Ampli(ed loci included both highly variable introns and conserved regions responsible for producing ribosomal RNA (Figure 2). Generated ITS sequences were primarily used for comparison with existing 19 sequence databases while LSU sequences were used to construct phylogenetic hypotheses.

Figure 2. Illustration of ribosomal cassette showing conserved primer locations and approximate amplicon lengths for two relevant gene regions: nuclear ribosomal Internal Transcribed Spacer (ITS), and nuclear ribosomal Large Sub- Unit (LSU).

Environmental PCR and Illumina® Sequencing

Preparation of samples for environmental PCR and Illumina® sequencing was conducted in the Amend lab at the University of Hawaii, Manoa. Preparation of the Illumina® library was conducted with direction from Dr. Anthony Amend and PhD candidate Gerald Cobian. For total DNA extraction, (ve, 1/4" diameter subsamples of leaf tissue were taken from each sample. Total DNA was extracted from leaf tissue using the PowerPlant® Pro-htp 96 Well DNA Isolation Kit (MoBio®). Leaf tissues were homogenized in 2 ml tubes containing Lysing Matrix A (MoBio®) after adding the initial solutions from the PowerPlant® extraction kit (PD1, Phenolic Separation Solution, PD2, and RNAse A). Undiluted extracted DNA was used as template for fungal ITS ampli(cation. 20

Unique barcoded fusion primers were used for each sample (see Appendix 1). Ampli(cation was conducted using Kapa3G polymerase (KAPA3G Plant PCR kit, Kapa Biosystems®), an evolved polymerase for use in plant systems, and the primers ITS1F and ITS2 (Gardes & Bruns, 1993; White et al.,

1990). Each 96-well plate contained one positive control containing fungal DNA and three negative controls containing ultra-pure water. Successful ampli(cation of the ITS region was con(rmed for each sample by running PCR products out on a 1.5% agarose gel stained with Gel Red™ (Biotium®). Reactions that yielded a single visible band at ~400 bp were considered successful. Any extractions that failed to amplify were subjected to magnetic bead cleanup and re-ampli(ed with new barcoded primers. Samples that failed a second time were re-extracted from additional leaf material and re-ampli(ed with new barcoded primers. All PCR products from successful reactions were subjected to bead-cleanup in order to remove undesired products formed by primer dimers. The SPRI magnetic bead solution used for cleanup contained 0.1% carboxyl-modi(ed Sera-Mag Magnetic Speed-beads, 18% PEG-8000, 1M NaCl and 10mM Tris-HCl (ph 8.0), and 1 mM EDTA (pH 8.0). Finally, all samples were normalized using Just-a-Plate™ 96 PCR Puri(cation and Normalization Kit. Samples were then combined into a single library and library quantitation was performed using the Qubit dsDNA HS Assay Kit and a Qubit™ 2.0 :uorometer (Life Technologies®). Sequencing was performed using Illumina® V2 Reagent Kit and the Illumina® MiSeq platform at the Hawaii Institute of Marine Biology Core Facility. Due to an unforeseen error involving sequencing primers, a portion of samples were not represented in the sequencing data from the initial Illumina® run. A second sequencing run was 21 performed with additional sequencing primers added, and the assembled and (ltered reads from both runs were combined.

Analysis

Analysis of Illumina® sequencing data was performed using components of the QIIME bioinformatics pipeline (Caporaso et al., 2010), Paired End reAd mergeR (PEAR; Zhang, Kobert, Flouri, & Stamatakis, 2014), Geneious®, and the

Vegan package for the R program for statistical computing (Oksanen et al., 2016;

R Core Team, 2016). PEAR was used to assemble forward and reverse reads from the raw Illumina® sequence data into contiguous sequences. FastQC (v.0.11.04) was used to assess assembled read quality and generate summary information for the raw sequencing data. The QIIME wrapper scripts validate_mapping_(le, extract_barcodes, convert_fastaqual_fastq, (lter_fasta, and split_libraries_fastq, were used to identify assembled reads with valid barcodes and rename assembled reads with the appropriate sample name. These steps were performed for both Illumina® runs, and the resulting sequence (les were concatenated. Chimeric sequences were identi(ed and removed using vsearch (v.1.9.10) and the UNITE ITS dynamic reference database (v01.01.2016). Operational Taxonomic Units (OTUs) were assembled using the QIIME wrapper script "pick\_open\_reference\_OTUs.py", which utilized BLAST (Altschul 1990) to match sequences against a curated fungal sequence database (UNITE, v01.01.2016) and subsequently grouped sequences at 97% identity using UClust. 22

Processing of raw Illumina® data was conducted with assistance from Gerald Cobian using the Cray supercomputer at University of Hawaii, Manoa's High Performance Computing Facility. QIIME output (les were then imported into the R program for Statistical computing. Samples with duplicate names and low read counts (<1800 reads) were removed. Below the cutoff range, samples had two orders of magnitude fewer reads, which were considered to be the result of sequencing errors. Further analyses were conducted using the community ecology statistics package Vegan (Oksanen et al., 2016). Samples were rare(ed down to the minimum sequencing depth (1835 reads) and square root transformed. A Bray Curtis dissimilarity matrix was calculated across samples and community dissimilarity was plotted using NMDS (vegan, ggplot2). Community dissimilarity was compared across elevation, Mean Annual Temperature (MAT), rainfall, host plant lineage, and host plant traits. Rainfall and temperature data were extracted from GIS data, made available by the UH

Manoa Geography department (Frazier et al., 2015), using the packages raster, rasterVis, and rgdal (Bivand, Keitt, & Rowlingson, 2016; Hijmans, 2016; Perpiñán

& Hijmans, 2016). Host plant trait data was derived from recently published data on wild populations of Hawaiian Scaevola (McKown et al., 2016).

Analysis of cultured isolate Sanger sequencing data was conducted using the Perry lab’s workstation (3.2 GHz Core i5 iMac). Forward and reverse reads were assembled into contiguous sequences in Geneious®. Isolate sequences were grouped into 97% OTUs using the same QIIME script described above. Representatives from each culture based OTU were sequenced for an additional more highly conserved locus (LSU). These representatives were then used to place cultured isolates within the evolutionary context of Ascomycota, using an 23 alignment of LSU data generated by Schoch et al. as part of the Assembling The

Fungal Tree Of Life project (2009). Sequences were aligned in Geneious® using

MUSCLE with default settings (Edgar, 2004). Phylogenies were constructed in

RAxML under a GTRGAMMA substation model (Stamatakis, 2014). The R program for statistical computing was used to assess ecological patterns based on sample metadata and OTU assignment through the packages vegan, ggplot2, dplyr, tidyr, and ggtree (Wickham, 2009, 2016; Wickham & Francois, 2016; Yu,

Smith, Zhu, Guan, & Lam, n.d.). Cultured endophyte ITS data was paired with

Illumina® sequencing data by re-running the QIIME script "pick\_open\_reference\_OTUs.py" using cultured isolate sequences as the reference database.

Results

Summary Data from Cultured Isolates

In total, 164 fungal isolates were recovered from 8 species of Scaevola (23 individuals) from the islands of Hawaii (Big Island) and Kauai. Nuclear ribosomal Internally Transcribed Spacer region (ITS) sequences were generated for 99 isolates, which were grouped into 46 Operational Taxonomic Units (OTUs) and assigned 27 species names. Based on automated taxonomic assignment, cultured endophytic fungi belonged to two known classes, Dothideomycetes and 24

Sordariomycetes (Figure 3). For each OTU, ribosomal Large Subunit (LSU) sequences were generated from a representative isolate. Isolate metadata is listed in Appendix 1. A majority of isolate OTUs were isolated from a single individual and thus appeared to be rare or restricted in their distribution. A small minority of isolated endophytic fungi were recovered from three or more individuals and should be considered widespread generalists (Figure 4). Of the 46 recovered OTUs, 13 were isolated from multiple species of Scaevola. The two OTUs showing the greatest host generalism (EU552111 and JN135282) were isolated from 5 different species of Scaevola. The taxa most frequently encountered during sampling are shown in Table 2. 25

Figure 3. Graph showing distinct OTUs isolated from each species. Each bar is colored by QIIME assigned taxonomy of OTUs at the class level. 26

Figure 4. Number of Scaevola individuals each OTU was isolated from (total individuals successfully isolated from = 23). Most OTUs (29) were encountered a single time. The most common and widespread OTU (EU552111) was isolated from 11 individuals. 27

Table 2. List of the most frequently isolated endophytic species. Species level taxonomy assigned to each OTU by QIIME. Of 27 distinct species assignments, 13 were encountered more than once. Assigned Species Level Taxonomy Isolates OTUs Colletotrichum_gloeosporioides 20 6 Neofusicoccum_parvum 10 2 unculturedfungus 8 6 Phomopsis_sp_H1 7 3 Annulohypoxylon_stygium 6 2 Glomerella_cingulata 6 2 Glomerella_acutata 5 2 Pestalotiopsis_sp_SD01ES 5 2 unculturedAscomycota 5 3 fungal_endophyte_sp_g83 4 2 Fusarium_solani_f_batatas 4 1 Diaporthe_cynaroidis 3 1 Hypocreales_sp_HLS104 3

Phylogenetics

Maximum likelihood trees, constructed using nuclear ribosomal large subunit data (LSU), corroborated automated taxonomic assignments at the class level. Phylogenies placed OTUs that remained unidenti(ed following QIIME taxonomic assignment into known fungal lineages. All isolate OTUs belonged to class Sardariomycetes or Dothideomycetes, and several clades of closely related OTUs were revealed. Three genera assigned by QIIME were shown to be polyphyletic and many isolates unidenti(ed at the genus level were shown to be members of commonly isolated genera. To assist with interpretation, the phylogeny including all isolated OTUs across Ascomycota has been divided into Figure 5 showing class Sardariomycetes and Figure 6 showing class 28

Dothideomycetes. In order to assess the validity of species level identi(cation, a smaller phylogeny was constructed focusing on the order Glomerellales, which included the most frequently isolated taxon Colletotrichum gleosporiodes (Figure 7). 29

Figure 5. A maximum likelihood tree of all fungal isolate OTUs shown in the context of Ascomycota. Enlarged phylogeny on the right shows placement of isolates within Sordariomycetes. Isolates are colored by genus level QIIME assigned taxonomy. Branch labels indicate bootstrap support from 100 bootstrap replicates. 30

Figure 6. Enlarged phylogeny on the right shows placement of isolates within Dothideomycetes. Isolates are colored by genus level QIIME assigned taxonomy. Branch labels indicate bootstrap support from 100 bootstrap replicates. 31

Figure 7. A maximum likelihood tree of isolates assigned to order Glomerellales. Isolates are colored by species level taxonomy assigned by QIIME. Branch labels indicate bootstrap support from 100 bootstrap replicates.

Summary Data from Illumina® Sequencing

Next-generation sequencing data was derived from two Illumina® MiSeq runs. Scaevola samples were sequenced alongside samples from other 32

Hawaiian plant genera not included in this study. The (rst run yielded 14,868,974 reads, of which 13,934,262 (93.714%) were successfully assembled. The second Illumina® run yielded 18,888,478 reads, of which 17,394,585 (92.091%) were successfully assembled. Chimera removal, library splitting, and concatenation of data from both Illumina® runs yielded 6,129,454 high quality reads that were assigned to 5,505 OTUs. Based on primer barcode sequences, reads were recovered from 491 plant samples, of which 26 represented samples taken from Scaevola. Sequencing of Scaevola samples yielded 633,716 reads, representing 10.33% of the total run. Reads associated with Scaevola samples were grouped into 1,277 OTUs, representing 23.20% of the OTUs recovered from the total run. Some samples were removed to due labeling errors that resulted in duplicate sample names. Finally, removing samples with low read counts (<1800 reads) left 15 high quality samples available for further analysis. This cutoff was chosen based on read count distribution across samples. Samples below the cutoff contained two orders of magnitude fewer reads, which were deemed to be a result of sequencing error. Each sample corresponded to a single host plant from one of the seven species of Scaevola sampled. and S. mollis were each represented by three samples, S. gaudichaudiana was represented by a single sample, and the remaining species were represented by two samples each. Figure 8 provides an overview of fungal classes recovered by Illumina® sequencing. Taxonomy was determined by automated Blast search against the UNITE fungal database. The two most diverse classes of endophytic fungi were Dothideomycetes and Sardariomycetes. The number of OTUs recovered from each species of Scaevola increased with sampling effort. Figure 9 Illustrates the predominance of rare fungal OTUs. Of 1,277 OTUs encountered, 826 were 33 encountered only in a single sample, making it dif(cult to infer range or host speci(city.

Figure 8. Distinct OTUs recovered from each species through environmental PCR and Illumina® sequencing. Dominant classes of fungal endophytes included Dothideomycetes and Sardariomycetes. Many OTUs remained unidenti(ed at the class level. 34

Figure 9. Histogram showing the number of plant individuals each OTU was recovered from. The vast majority of OTUs were not encountered in more than one individual. 826 OTUs were encountered a single time while 54 OTUs were encountered in more than 3 times. 35

Diversity of OTUs from Illumina® Sequencing

Endophytic diversity was quanti(ed by calculating Shannon diversity indices for each sample. Figure 10 shows variation in endophyte diversity between evolutionary lineages of Hawaiian Scaevola. Differences in per-sample diversity did not differ between lineages. Figure 11 through 14 show endophyte species accumulation curves for Scaevola species, Scaevola lineage, and total sampling effort. Species accumulation curves are calculated by random resampling of the data and are an indicator of how comprehensively a given community was sampled. In each case, species accumulation did not plateau, indicating that sampling failed to completely recover endophytic communities. 36

Figure 10. A Plot showing Tukey HSD of Shannon diversity of endophytic community for samples from the different host lineages. There were no signi(cant differences in the mean per-sample diversity of endophytic communities between the three host lineages. 37

Figure 11. OTU rarefaction curves by sample. Sequencing depth and OTU sampling effectiveness varied by sample. OTU accumulation did not plateau for any sample. 38

Figure 12. OTU rarefaction curves by host species. Sequencing depth and OTU sampling effectiveness varied by species. OTU accumulation did not plateau for any of the sampled host species. 39

Figure 13. OTU rarefaction curves by host lineage. Lineage A was comprised of S. glabra. Lineage B was comprised of S. taccada. Lineage C was comprised of S. mollis, S. gaudichaudii, S. gaudichaudiana, S. procera, and S. chamissoniana. Though Lineage C had the most sampling effort and highest number of OTUs, accumulation of OTUs did not plateau. 40

Figure 14. OTU rarefaction curve across all samples. Total sampling effort did not adequately recover the total endophytic community present in sampled Scaevola species. 41

Sample Dissimilarity

Samples were ordinated in two-dimensional space so that variation in fungal endophytic communities could be visualized. Ordinations are not a statistical test, but are essential for evaluation of trends in community composition. Community dissimilarity between Samples is quanti(ed and each sample is plotted such that the distance between points re:ects community dissimilarity. Ordinations were performed using Non-Metric-Multidimensional- Scaling (NMDS) based on rare(ed Bray-Curtis dissimilarity. NMDS ordinations did not indicate that endophyte communities were more similar within host species or within islands (Figure 15). However, visualizing samples by host evolutionary lineage and mean annual temperature at collection site revealed potential ecological patterns in community ecology (Figures 16 & 17). Endophytic communities appeared to be more similar within the three lineages of Scaevola and within similar mean annual temperature ranges. Subsequent ordinations using Redundancy Analysis (RDA) further indicated mean annual temperature and host lineage were signi(cantly correlated with endophyte community dissimilarity (Figure 18). However, multivariate dispersion was signi(cantly different between the Hawaiian Scaevola radiation and the two smaller lineages (Figure 20). Differences in multivariate dispersion could explain host lineage was identi(ed as a signi(cant variable. In addition to temperature and host lineage, leaf phosphorous content and freely ending veinlet density were identi(ed as potentially important leaf traits by stepwise variable selection and RDA (Figure

19). 42

Figure 15. Non-metric multi-dimensional scaling (NMDS) visualization of Bray- Curtis dissimilarity of endophytic communities between samples. Samples are colored by host species and island of origin. Additional plot information is available in Appendix 3. 43

Figure 16. NMDS plot of Bray-Curtis dissimilarity of endophyte communities between samples. Samples are colored by host plant lineage with a shaded ellipse showing the 95% con(dence interval for placement of samples in Lineage C. Additional plot information is available in Appendix 3. 44

Figure 17. NMDS plot of Bray-Curtis dissimilarity of endophyte communities between samples. Samples are colored by Mean Annual Temperature (TempFactor) at the collection site. Shaded ellipses show 95% con(dence interval for placement of samples at different levels of MAT. Additional plot information is available in Appendix 3. 45

Figure 18. Three-dimensional visualization of redundancy analysis (RDA) of Bray-Curtis dissimilarity constrained by mean annual temperature (temp) and host lineage. Additional plot information shown in Appendix 3. 46

Figure 19. Three-dimensional visualization of RDA of Bray Curtis dissimilarity constrained by leaf phosphorous mass (Pmass), mean annual temperature (temp), and freely ending veinlet density (FEVdensity). Host lineage boundaries are visualized, but not included as a variable in the RDA. Additional plot information is available in Appendix 3. 47

Figure 20. Plot of Tukey’s Honestly Signi(cant Difference Test performed on multivariate dispersion between lineages. Dispersion differed signi(cantly between Lineage C and the other two Lineages. Dispersion did not differ signi(cantly between Lineage A and Lineage B. Additional plot information is available in Appendix 3. 48

Combining Cultured Isolate Phylogenies and Illumina® Sequencing Data

In order to generate links between cultured endophytes and high- throughput sequencing data, Illumina® reads were probed for ITS sequences matching those of cultured endophytes. A total of 21 sequences generated from cultured endophytes matched sequences in the Illumina Data (Figure 21). LSU sequences for cultured isolates were used to generate phylogenies, which were subsequently annotated with distribution information indicated by the larger Illumina dataset. Host speci(city appeared to be similar for closely related taxa (Figure 22). Dispersal barriers between islands did not appear to be signi(cant, with 5 out of 21 taxa appearing to be restricted to a single island (Figure 23). Finally, although closely related taxa generally had similar temperature ranges, different endophyte genera appeared to have different temperature ranges (Figure 24). 49

Figure 21. Host species distribution of cultured endophytic fungi based on Illumina® sequencing data. Of 99 sequenced isolates, 21 were successfully probed for in the Illumina® data. Isolate K.055.A3.1 represents the least host speci(c OTU and was found across all seven sampled species of Scaevola. 50

Figure 22. Maximum likelihood tree of 46 cultured OTUs using LSU. Tip labels are colored based on their presence or absence in the Illumina® data. Columns show presence or absence of cultured endophyte OTUs in the next gen sequencing data for sampled species of Scaevola. Branch labels indicate bootstrap support based on 100 bootstrap replicates. 51

Figure 23. Maximum likelihood tree of 46 cultured OTUs using LSU data. Tip labels are colored based on their presence or absence in the Illumina® data. Columns show presence or absence of cultured endophyte OTUs across islands. 52

Figure 24. Maximum likelihood tree of 46 cultured OTUs using LSU data. Tip labels are colored based on their presence or absence in the Illumina® data. Columns show presence or absence of cultured endophyte OTUs across temperature ranges (ºC). 53

Discussion

This project was an initial inquiry into the ecology and evolution of endophytic fungi in the previously un-investigated plant genus Scaevola. The combination of culturing and Illumina® sequencing methods allowed for both community and species level analysis. Culturing and Illumina® sequencing data were analyzed independently and through novel methods combining both datasets. Foliar fungal endophytes of Hawaiian Scaevola are extremely diverse. In total, 1,277 OTUs were recovered by high-throughput environmental sequencing, and 46 OTUs were recovered through culturing. Based on high-throughput sequencing, Hawaiian Scaevola contained endophytic fungi from 17 classes, 42 orders, and 92 families. Lack of asymptotic rarefaction curves at the sample, species, and whole dataset level indicated that the natural endophytic community was not completely sampled (Figures 12-14). It is likely that the endophytic community of Scaevola is even more diverse than our sampling suggests. The endophytic genera that were commonly encountered in culture differed substantially from those encountered in the Illumina® sequencing data (see Appendix 1). Culture based methods were only able to isolate endophytic fungi belonging to two classes within phylum Ascomycota: Sardariomycetes (Figure 5) and Dothideomycetes (Figure 6). The majority of cultured OTUs (29 out of 46) were encountered only once, while one generalist OTU was isolated from 11 separate individuals (Figure 4). Phylogenies constructed from cultured isolate sequence data indicated that cultured endophytes were strongly represented in a few genera and relatively rare across many others. 54

Endophytic fungi isolated in culture belonged to genera that also contained known plant pathogens and endophytic taxa. The most frequently encountered genera in cultured isolates included Xlyaria, Glomerella/Colletotrichum, Diaporthe, and Botryosphaeria. The most frequently encountered genera in the Illumina® sequencing data included Colletotrichum/Glomerella, Mycosphaerella, Pseudocercospora, Byssochlamys, Stenella, Derxomyces, Rachicoladiosporium, and Ramichloridium. Notably, ten percent of the OTUs recovered by high- throughput sequencing belonged to phylum Basidiomycota while culture based methods failed to capture any members of Basidiomycota. Within the Illumina® dataset, more fungal sequences remained unidenti(ed at the genus level than were present in the most commonly identi(ed genera. The prevalence of unidenti(ed fungal sequences in the Illumina® data is an indication of high- throughput sequencing’s limited speci(city. Though high-throughput methods were more powerful at assessing endophytic diversity the accurate assignment of taxonomy to these reads remained a struggle. As expected, there was overlap between the fungal sequences generated from cultured isolates and sequences in the Illumina® data. This overlap allowed us to link a portion of the Illumina® reads with physical collections and a robust phylogenetic framework. Predictably, a large portion of overall endophytic fungal diversity was missed by culture-based methods. However, 21 of the 46 OTUS that were encountered in culture were not recovered via high-throughput sequencing (Figures 21-24). Some of these taxa were likely missed as a result of variability in clustering due to sequence length. In order to speed up overall processing time, reference based identi(cation of sequences relied on Blast searches of randomly selected sequences, followed by a round of clustering at 97% similarity. ITS 55 sequences generated by Sanger sequencing of cultured isolates were much longer than those generated by high-throughput sequencing. The longer Sanger sequences may have recovered more sequence variability, thereby increasing the number of cultured OTUs. To test this, Illumina® reads that clustered with isolate BI.245.2.D2.1 were extracted and aligned using MUSCLE (Edgar, 2004) as implemented in Geneious®. Pairwise similarity to the Illumina reads was found to be identical for two closely related cultured isolate OTUs (BI.245.2.D2.1 & BI.245.1.A6.1). Although these two isolates were placed in separate OTUs based on full length Sanger sequences, they were identical in the region that was used to probe the Illumina® data. As a result, the Illumina® reads were arbitrarily clustered around one isolate sequence (BI.245.2.D2.1) while giving the appearance that the other isolate (BI.245.1.A6.1) was absent in all samples. In addition to limitations in our bioinformatics pipeline, certain rare taxa that were isolated in culture may have failed to amplify under the different PCR conditions used for Illumina® library construction. PCR ampli(cation of cultured endophytic fungi used DNA extracted from individuals. In contrast, environmental sequencing required PCR to be performed on samples containing many fungal taxa with differential af(nity for the selected primers. This ampli(cation bias is also known to skew read counts, making them a poor approximation for actual microbial abundance (Amend, Seifert, & Bruns, 2010). Additionally, there may have been some inhibition of PCR by plant compounds that remained in the leaf DNA extracts. The heterogeneous nature of environmental samples could easily lead to PCR selectively amplifying certain taxa and not others. 56

Phylogenetic Placement and Evolutionary Hypotheses

Culturing endophytic fungi allowed us to generate DNA sequences for multiple loci and visually document culture morphology. This study represents the (rst known effort to collect and preserve endophytic fungi from the genus Scaevola, and the family Goodeniaceae. Additionally, cultured isolates provided a means of testing the ef(cacy of taxonomic assignment by our bioinformatics pipeline. OTU clustering and taxonomic assignment of ITS sequences from cultured isolates was performed using the same Next-Gen sequencing pipeline (QIIME/UNITE) that was used to analyze our Illumina® data. A representative isolate from each OTU was selected and sequenced for the nuclear ribosomal large sub-unit (LSU) locus, which allowed for phylogenetic placement within the phylum Ascomycota. The phylogenetic context of the isolates was then compared with QIIME taxonomic assignment (Figures 5 & 6). Our QIIME based pipeline used standard methods for OTU clustering and a curated database of fungal sequences (UNITE). However, despite the use of a fungal speci(c sequence database, there were apparent discrepancies in QIIME assigned taxonomy and our LSU phylogeny. Endophytic isolates identi(ed as belonging to the genera Phomopsis, Pestalotiopsis, and Diaporthe were polyphyletic in the phylogenetic analyses, indicating possible misidenti(cation at the genus level. In total, 26 isolates (comprising 18 OTUs) were classi(ed as ‘Unidenti(ed’ at the genus level by our pipeline. Phylogenetic placement indicated these unidenti(ed isolates represented close relatives of Guignardia, Davidiella, Delphinella, Sporomiella, Phaeodothis, Pseudonectria, Diaporthe, Phomatospora, Xylaria, 57 and Cladosporium. Eight of these genera would have gone unreported had we relied solely on QIIME taxonomic assignment.

The Dominant Genus Colletotrichum

Based on QIIME assigned taxonomy, Glomerella cingulata (Anamorph: Colletotrichum gloeosporiodes; Family: Glomerellales) was the most commonly isolated taxon in culture. Of the top ten most abundant OTUs in the Illumina® data, four were identi(ed as Colletotrichum species. Colletotrichum is a broad, frequently encountered genus, containing numerous plant pathogenic strains. Members of the genus Colletotrichum have been isolated as endophytes in every major group of angiosperms sampled thus far (Cannon, Damm, Johnston, & Weir, 2012). Research on Colletotrichum endophytes of Theobroma cacao has shown they have pervasive effects on host plant gene expression and are closely related to pathogenic strains within the same species (Mejía et al., 2014; Rojas et al., 2010). As pathogens, they are notable for forming latent infections that remain undetected prior to harvest (Dean et al., 2012). Based on our results, Colletotrichum species are likely a dominant player in the foliar endophytic communities of Hawaiian Scaevola. We therefore elected to focus on this genus for further phylogenetic analysis. Colletotrichum was recently ranked at number eight in a list of the top ten plant pathogenic genera and is an economically important pathogen of , vegetables, and ornamentals (Dean et al., 2012). Despite its recognized importance in the (eld of , Colletotrichum has long suffered from 58 taxonomic and ecological uncertainty. In a recent review of Colletotrichum taxonomy, Cannon et al. (2012) write, “Communication of information relating to Colletotrichum species has been seriously compromised in the past by misidenti(cation, misapplication of names and grossly differing species concepts.” Additionally, many Colletotrichum species are genetically identical to species described as Glomerella (Réblová, Gams, & Seifert, 2011). This confusion stems from the two-name problem in mycology, whereby both asexual and sexual stages of a taxon were given distinct names. This practice has since been abolished, with the sexual stage typically taking taxonomic precedence. However, the abundance of research on the asexual stage, Colletotrichum, has compelled researchers to conserve the asexual name. Prior to 1957, the genus Colletotrichum included 750 species. After host species identity was determined to be an insuf(cient criterion for establishing new species concepts, the genus was collapsed into a much tidier 11 species aggregates (von Arx, 1957). As of

2012, one particularly widespread species, Colletotrichum gloeosporiodes, was determined to have approximately 600 synonymous published names (Cannon et al., 2012). The abundance of research attention and taxonomic confusion has lead to a large number of “Colletotrichum” and “Glomerella” sequences being uploaded to public databases. However, the abundance of sequence data for Colletotrichum/Glomerella isolates in GenBank® (3495 ITS sequences for “Glomerella cingulata”) is more hindrance than help. Many sequences in GenBank® are incorrectly identi(ed or were themselves identi(ed by Blast searches against GenBank®. This has lead to a steady degradation of the species concept. As an assessment of phylogenetic diversity within the species 59 concept, 100 random ITS sequences identi(ed as “Glomerella cingulata” were selected from GenBank®. All sequences ranged from 500 to 700 base pairs. After alignment, calculated pairwise sequence similarity ranged from 100% to 46%, indicating the prevalence of clear and egregious misidenti(cation. Efforts to accurately identify Colletotrichum species using molecular methods are therefore dependent on curated sequence databases associated with expert identi(ed physical collections. In order to assess QIIME taxonomic assignment at the species level, we constructed a maximum likelihood phylogeny of endophytic isolates in the family Glomerellales (Figure 7). A robust phylogenetic backbone was provided using sequences from expert identi(ed collections (Réblová et al., 2011). Additionally, each isolate sequence was Blast searched against GenBank®, and named sequences with >90 % pairwise similarity were included in the analysis. Based on our phylogeny, 5 out of 9 OTUs in the genus Colletotrichum/Glomerella were misidenti(ed by our QIIME based pipeline at the species level. Four OTUs were grouped with Glomerella acutata, two with Colletotrichum boninense, two with Colletotrichum gloeosporioides, and one fell out sister to the basal species Colletotrichum dracaenophilum. The polyphyly of endophytic Colletotrichum isolated from Scaevola indicates multiple introductions of Colletotrichum to Hawaii. Determining whether our cultured isolates (t within previously described species concepts will require additional morphological examination. The abundance of Colletotrichum isolated from Scaevola makes the genus a good candidate for future investigation into population genetics. Though it is likely that these endophytes are morphologically similar to previously described species, sequencing indicated substantial genetic 60 diversity within the recovered isolates (9 distinct OTUs). It remains to be seen whether the strains isolated from Hawaiian Scaevola appear to be genetically isolated from the global Colletotrichum population.

Community Ecology

Despite limitations in taxonomic accuracy, high-throughput sequencing remains the best method for identifying large-scale ecological trends in endophytic communities. A major goal of this study was to identify candidate factors that may affect endophyte community structure in Scaevola. Given the lack of previous work on Hawaiian Scaevola, sampling was conducted in the form of an exploratory survey. In recognition of the limitations inherent in this type of sampling, analysis of ecological data was restricted to ordinations yielding qualitative information. The primary objective of these analyses was to identify potential trends that could be further investigated through structured experimentation. Our initial hypothesis was that endophytes of Hawaiian Scaevola would show some degree of host species specialization and dispersal limitation between islands. To test this, endophyte communities were compared across samples using a Bray-Curtis dissimilarity matrix. This yielded a matrix of pairwise dissimilarity values for each sample. Bray-Curtis dissimilarity is often thought to be analogous to a distance matrix though as Legendre and Anderson note (2016) this is not entirely accurate. It is possible, however, to plot Bray-Curtis values visually by converting dissimilarity into Euclidean distance on a two- or three- 61 dimensional plot. Samples are scattered randomly and their arrangement is permuted until the Euclidian distance between samples re:ects the dissimilarity of their respective communities. Samples that are more similar to each other fall out closer together while samples further apart are more dissimilar. Based on NMDS plots of Bray-Curtis dissimilarity (Figures 15-17), it appears that host species and island of origin are unlikely to be signi(cant drivers of endophyte community composition. Samples did not obviously cluster by host species and samples collected from different islands were frequently plotted in close proximity. However, samples tended to visually cluster based on Mean Annual Temperature (MAT) at the collection site and by host evolutionary lineage. Unfortunately, sampling was not suf(cient to examine differences between wet and dry clades within the Hawaiian radiation of Scaevola. Redundancy Analysis (RDA) was employed to rapidly determine additional variables that correlated with community composition. RDA is similar to NMDS, with added steps to calculate whether a variable or group of variables correlates with community dissimilarity. Sample metadata, such as host lineage, temperature, or leaf phosphorous content, are converted into a Euclidian distance matrix and ordinated alongside sample dissimilarity. RDA then tests to see if the positions of samples based on metadata are signi(cantly correlated with their position based on community dissimilarity. Though not de(nitive, RDA provides a quantitative means for identifying possible drivers of community dissimilarity. RDA model selection was performed in a stepwise manner using the function ordistep in the package Vegan (Oksanen et al., 2016). The (rst RDA was restricted to sample metadata related to environmental factors and host lineage (Figure 18). Stepwise model selection found that host lineage (corresponding to 62 introduction history) and MAT were signi(cantly correlated with sample dissimilarity. It is therefore likely that endophytic communities of Scaevola are in:uenced both by their hosts’ evolutionary history and by the habitat in which their host plant is growing. In other words, there is a combination of host genetic and environmental factors in:uencing endophytic community composition in Hawaiian Scaevola. RDA is prone to assigning signi(cance to groups that have signi(cantly different multivariate dispersion. Thus, host lineage may have been selected as a signi(cant factor on the basis of dispersion, and not because endophyte communities were signi(cantly different between lineages. To test this, dispersion of Bray-Curtis dissimilarity within host plant lineages was calculated using the betadisper function in Vegan (Oksanen et al., 2016) and differences between groups were examined using Tukey’s Honestly Signi(cant Difference test (Figure 20). Dispersion was found to be signi(cantly different between groups, which means we cannot con(dently say that the three lineages harbor distinctly different communities. However, the difference in multivariate group dispersion has interesting ecological implications. Multivariate dispersion of community dissimilarity has become an accepted estimate of beta-diversity, also known as regional diversity (Anderson, Ellingsen, & McArdle, 2006). Rather than comparing diversity at the sample level (i.e. alpha diversity), beta-diversity assess the larger community encompassed by all samples within a group. Beta-diversity was signi(cantly higher in the multi-species radiation (Lineage C) compared to the two single species lineages (Lineages A & B). This could be the result of limited sampling of the two smaller lineages. It is possible that if more samples of the smaller lineages were added, their dispersion would match that of Lineage C. However, 63 there is also a plausible biological explanation for the difference in beta-diversity between host lineages. Lineage C contains seven different species (5 of which were included in this analysis) living in a range of habitats. In contrast Lineages A and B each contain a single species restricted to a single habitat. Our results indicate that the largest lineage with the broadest habitat range has the highest beta-diversity. It is possible that, as members of Lineage C radiated into new habitats, the overall diversity of endophytic taxa in that lineage increased. This would imply that Hawaiian plant genera rapidly associated with new endophytic taxa as they colonized new habitats. In an effort to identify differences in host phenotype that may correlate with foliar endophytic community composition, we incorporated leaf trait data for each host species from McKown et al. (2016) into stepwise RDA selection. When provided with these additional data, the stepwise model selection function ordistep (Vegan; Oksanen et al., 2016) selected leaf traits related to freely ending veinlet density and phosphorous content together with mean annual temperature as the combination of variables most signi(cantly correlated with endophyte community dissimilarity. Interestingly, the signi(cance and reduction in variance (an indicator of model goodness of (t) for the leaf trait RDA was similar to that of the RDA performed using host lineage and mean annual temperature. The joint signi(cance of mean annual temperature and phosphorous mass in leaf tissue is not surprising, given that both are known to decrease with higher elevation. Interestingly, minor vein density varies by Scaevola species but it is not correlated with lineage or environment. It is plausible that minor vein density increases the abundance of endophytes speci(c to vascular tissues. 64

In summary, endophytic fungal communities of Scaevola do not appear to be limited by dispersal across islands or hosts, but are potentially related to mean annual temperature, host evolutionary history, and speci(c leaf traits. Mean annual temperature is likely a key environmental variable and was previously found to correlate with endophyte community structure in both Hawaiian

Metrosideros (Zimmerman & Vitousek, 2012) and Japanese Quercus

(Hashizume et al., 2008). Compared to environmental and host genetic factors, the potential effects of leaf traits on endophytic communities have received relatively little attention. The only previous study conducted in the tropics found that leaf nutrients (nitrogen, calcium, and aluminum content), growth rate, mortality rate, and mass per area signi(cantly correlated with tropical phyllosphere fungal community structure (Kembel & Mueller, 2014). This project is the (rst to report leaf phosphorous content and vein density as potentially in:uential factors in endophyte community composition.

Synthesis of Culture and Environmental Sequencing Data

A side objective of this project was to develop a novel method for combined analysis of cultured isolates and high-throughput sequencing. Though Illumina® sequencing produced orders of magnitude more data, it provided very little insight into the biology of speci(c fungal species. Culturing allowed for more de(nitive taxonomic placement, but was not a reliable means of determining presence or absence of a species within a sample. Using sequences generated 65 from cultured isolates as a reference database allowed Illumina® reads to be paired with physical collections and robust phylogenies. Cultured isolate phylogenies were greatly informed by probing the environmental sequencing data (Figures 21 -24). More populated clades, such as those within Glomerellaceae and Xylariacaeae, revealed distinct environmental ranges for closely related isolates. While some isolates were apparent generalists, others appeared to be rare, and possibly under some form of habitat restriction (e.g. K.070.C6). One of the fungal isolates (K.055.A3.1) appears to be a common host generalist and was found across all seven species of Scaevola. The majority of cultured isolate OTUs identi(ed in the Illumina® data were found in more than one host species and on more than one island. Four isolates were potentially host and island speci(c (K.070.C6, BI.185.2.B6, BI.191.A6, and BI.181.2.A1). None of the cultured isolates appeared to be unique to the extreme ranges of mean annual temperature and all but two cultured isolates were found in multiple temperature ranges. Habitable temperature ranges can be judged based on the maximum and minimum temperature recorded for each isolate. Though not a comprehensive sampling, these data constitute the (rst description of ecological niches for speci(c species of Hawaiian endophytic fungi. Overall, the utility of culturing alongside environmental sequencing cannot be overstated. Though recent research has focused on rapid data accumulation via high-throughput sequencing, these data are impossible to analyze without a solid taxonomic knowledge base. While curated fungal sequence databases are currently under construction (e.g. UNITE; Abarenkov et al., 2010), their utility is limited in understudied geographic regions. Even commonly isolated fungal genera, such Colletotrichum, are desperately in need of additional species level 66 research (Cannon, Damm, Johnston, & Weir, 2012). Physical culture collections enable the construction of a local reference database, allowing some portion of high-throughput sequences to be accurately associated with speci(c local species. Rather than viewing culturing and high-throughput sequencing as independent methods, they should be considered as two essential parts of a comprehensive survey. Though the majority of environmental sequences will not match cultured isolates, it is important to begin the work of understanding species level trends in fungal endophyte ecology

Future Directions for Scaevola

Hawaiian Scaevola should be considered well suited for future research on foliar endophytic community structure. Their extensive habitat range allows for comparisons of community structure across broad environmental gradients. As a study system, Hawaiian Scaevola species have the potential to reveal insights into the relative effects of leaf traits, evolutionary lineage, and environmental factors under controlled experimental conditions. The recent publication of extensive leaf trait data for Hawaiian Scaevola has greatly expanded the potential for designing greenhouse studies to test the relationship between leaf traits and endophytic fungi at both the community and species level

(McKown et al., 2016). Further high-throughput sequencing of dry habitat species, such as Scaevola coriaceae and Scaevola kilaueae and wider sampling across the archipelago could provide additional support for the correlation 67 between leaf phosphorous content, mean annual temperature, and endophytic community dissimilarity. Finally, in-depth morphological description of cultured endophytes, particularly of isolates that were distantly related to known species, could provide novel reports of species distributions or even evidence for endemic Hawaiian endophytic species. 68

References

Abarenkov, K., Nilsson, R. H., Larsson, K. H., Alexander, I. J., Eberhardt, U., Erland, S., (…) Kõljalg, U. (2010). The UNITE database for molecular identi(cation of fungi - recent updates and future perspectives. New Phytologist. http://doi.org/10.1111/j.1469-8137.2009.03160 Amend, A. S., Seifert, K. A., & Bruns, T. D. (2010). Quantifying microbial communities with 454 pyrosequencing: Does read abundance count? Molecular Ecology, 19(24), 5555–5565. http://doi.org/10.1111/j.1365- 294X.2010.04898.x Anderson, M. J., Ellingsen, K. E. & McArdle, B. H. (2006), Multivariate dispersion as a measure of beta diversity. Ecology Letters, 9, 683–693. doi:10.1111/j.1461-0248.2006.00926.x Arnold, A. E. (2007). Understanding the diversity of foliar endophytic fungi: progress, challenges, and frontiers. Fungal Biology Reviews, 21(2–3), 51– 66. http://doi.org/10.1016/j.fbr.2007.05.003 Arnold, A. E., & Engelbrecht, B. M. J. (2007). Fungal endophytes nearly double minimum leaf conductance in seedlings of a neotropical tree species. Journal of Tropical Ecology, 23(3), 369. http://doi.org/10.1017/S0266467407004038 Arnold, A. E., & Lutzoni, F. (2007). Diversity and Host Range of Foliar Fungal Endophytes : Are Tropical Leaves Biodiversity Hotspots? Ecology, 88(3), 541–549. Arnold, A. E., Miadlikowska, J., Higgins, K. L., Sarvate, S. D., Gugger, P., Way, A., Lutzoni, F. (2009). A phylogenetic estimation of trophic transition 69

networks for ascomycetous Fungi: Are lichens cradles of symbiotrophic Fungal diversi(cation? Systematic Biology, 58(3), 283–297. http://doi.org/10.1093/sysbio/syp001

Bivand, R., Keitt, T., & Rowlingson, B. (2016). rgdal: Bindings for the Geospatial Data Abstraction Library. manual. Retrieved from https://cran.r- project.org/package=rgdal Busby, P. E., Newcombe, G., Dirzo, R., & Whitham, T. G. (2014). Differentiating genetic and environmental drivers of plant-pathogen community interactions. Journal of Ecology, 102(5), 1300–1309. http://doi.org/10.1111/1365- 2745.12270 Busby, P. E., Zimmerman, N., Weston, D. J., Jawdy, S. S., Houbraken, J., & Newcombe, G. (2013). Leaf endophytes and Populus genotype affect severity of damage from the necrotrophic leaf pathogen, Drepanopeziza populi. Ecosphere, 4(10), art125. http://doi.org/10.1890/ES13-00127.1 Cannon, P. F., Damm, U., Johnston, P. R., & Weir, B. S. (2012). Colletotrichum - current status and future directions. Studies in Mycology, 73, 181–213. http://doi.org/10.3114/sim0014 Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E. K. (2010). QIIME allows analysis of high-throughput community sequencing data. Nature Methods, 7(5), 335–336. article. Davis, E. C., Franklin, J. B., Shaw, A. J., & Vilgalys, R. (2003). Endophytic Xylaria (Xylariaceae ) among liverworts and angiosperms: Phylogenetics. American Journal of Botany, 90(11), 1661–1667. Dean, R., Kan, J. A. L., Pretorius Z. A. , Hammond-Kosack, K. E., Di Pietro, A., Spanu, P. D. (...) Foster, G. (2012).The Top 10 Fungal Pathogens in 70

Molecular Plant Pathology. Molecular Plant Pathology 13(4) 414-30. Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797. article. Frazier, A. G., Giambelluca, T. W., Diaz, H. F., & Needham, H. L. (2015). Comparison of geostatistical approaches to spatially interpolate month-year rainfall for the Hawaiian Islands. International Journal of Climatology. Article. Funk, V. A., & Wagner, W. L. (1995). Biogeography of seven ancient Hawaiian plant lineages. Hawaiian Biogeography: Evolution on a Hot Spot Archipelago, 160–194. article. Gardes, M., & Bruns, T. D. (1993). ITS primers with enhanced speci(city for basidiomycetes - applicaiton to the identi(caiton of mycorrhizae and rusts. Gazis, R., Miadlikowska, J., Lutzoni, F., Arnold, A. E., & Chaverri, P. (2012). Culture-based study of endophytes associated with rubber trees in Peru reveals a new class of : Xylonomycetes. Molecular Phylogenetics and Evolution, 65(1), 294–304. http://doi.org/10.1016/j.ympev.2012.06.019 Hashizume, Y., Sahashi, N., & Fukuda, K. (2008). The in:uence of altitude on endophytic mycobiota in Quercus acuta leaves collected in two areas 1000 km apart. Forest Pathology, 38(3), 218–226. http://doi.org/10.1111/j.1439- 0329.2008.00547. Higgins, K. L., Arnold, A. E., Miadlikowska, J., Sarvate, S. D., & Lutzoni, F. (2007). Phylogenetic relationships, host af(nity, and geographic structure of boreal and arctic endophytes from three major plant lineages. Molecular Phylogenetics and Evolution, 42(2), 543–555. http://doi.org/10.1016/j.ympev.2006.07.012 71

Hijmans, R. J. (2016). raster: Geographic Data Analysis and Modeling. manual. Retrieved from https://cran.r-project.org/package=raster Hoffman, M., Gunatilaka, M., Ong, J., Shimabukuro, M., & Arnold, A. E. (2008). Molecular analysis reveals a distinctive fungal endophyte community associated with foliage of Montane oaks in southeastern Arizona. Journal of the Arizona-Nevada Academy of Science, 91–100. article. Hoffman, M. T., & Arnold, A. E. (2008). Geographic locality and host identity shape fungal endophyte communities in cupressaceous trees. Mycological Research, 112(3), 331–344. http://doi.org/10.1016/j.mycres.2007.10.014 Howarth, D. G., & Baum, D. A. (2005). Genealogical Evidence of Homoploid Hybrid Speciation in an Adaptive Radiation of Scaevola ( Goodeniaceae ) in the Hawaiian Islands, 59(5), 948–961. Howarth, D. G., Gustafsson, M. H. G., Baum, D. A., & Motley, T. J. (2003). Phylogenetics of the genus Scaevola (Goodeniaceae): Implication for dispersal patterns across the Paci(c Basin and colonization of the Hawaiian Islands. American Journal of Botany, 90(6), 915–923. http://doi.org/10.3732/ajb.90.6.915 Joshee, S., Paulus, B. C., Park, D., & Johnston, P. R. (2009). Diversity and distribution of fungal foliar endophytes in New Zealand Podocarpaceae. Mycological Research, 113(9), 1003–1015. http://doi.org/10.1016/j.mycres.2009.06.004 Jumpponen, A., & Jones, K. L. (2010). Seasonally dynamic fungal communities in the Quercus macrocarpa phyllosphere differ between urban and nonurban environments. New Phytologist, 186(2), 496–513. http://doi.org/10.1111/j.1469-8137.2010.03197. 72

Kembel, S. W., & Mueller, R. C. (2014). Plant traits and taxonomy drive host

associations in tropical phyllosphere fungal communities 1. Botany, 92(4), 303–311. http://doi.org/10.1139/cjb-2013-0194 Legendre, P. & Anderson, M. J. (1999), Distance-Based Redundancy Analysis: Testing Multispecies Responses in Multifactorial Ecological Experiments. Ecological Monographs, 69, 1–24. http://doi:10.1890/0012- 9615(1999)069[0001:DBRATM]2.0.CO;2 Massimo, N. C., Nandi Devan, M. M., Arendt, K. R., Wilch, M. H., Riddle, J. M., Furr, S. H., … Arnold, a. E. (2015). Fungal endophytes in aboveground tissues of desert plants: Infrequent in culture, but highly diverse and distinctive symbionts. Microbial Ecology, (2015), 61–76. http://doi.org/10.1007/s00248-014-0563-6 McKown, A. D., Akamine, M. E., & Sack, L. (2016). Trait convergence and diversi(cation arising from a complex evolutionary history in Hawaiian species of Scaevola. Oecologia, 181(4), 1083–1100. http://doi.org/10.1007/s00442-016-3640-3 Mejía, L. C., Herre, E. A., Jed, P., Winter, K., García, M. N., Bael, S. A. Van, … Maximova, S. N. (2014). Pervasive effects of a dominant foliar endophytic on host genetic and phenotypic expression in a tropical tree, 5(September), 1–16. http://doi.org/10.3389/fmicb.2014.00479 O’Rorke, R., Holland, B. S., Cobian, G. M., Gaughen, K., & Amend, A. S. (2016). Dietary preferences of Hawaiian tree snails to inform culture for conservation. Biological Conservation, 198, 177–182. http://doi.org/10.1016/j.biocon.2016.03.022 73

Oksanen, J., Blanchet, F. G., Friendly, M., Kindt, R., Legendre, P., McGlinn, D., … Wagner, H. (2016). vegan: Community Ecology Package. manual. Retrieved from https://cran.r-project.org/package=vegan Oono, R., Lefévre, E., Simha, A., & Lutzoni, F. (2015). A comparison of the community diversity of foliar fungal endophytes between seedling and adult loblolly pines (Pinus taeda). Fungal Biology, 119(10), 917–928. http://doi.org/10.1016/j.funbio.2015.07.003 Perpiñán, O., & Hijmans, R. (2016). rasterVis. manual. Retrieved from http://oscarperpinan.github.io/rastervis/ R Core Team. (2016). R: A Language and Environment for Statistical Computing. manual, Vienna, Austria. Retrieved from https://www.r-project.org/ Réblová, M., Gams, W., & Seifert, A. (2011). Monilochaetes and allied genera of the Glomerellales, and a reconsideration of families in the Microascales. Studies in Mycology, 68, 163–191. http://doi.org/10.3114/sim.2011.68.07 Rivera-Orduña, F. N., Suarez-Sanchez, R. A., Flores-Bustamante, Z. R., Gracida- Rodriguez, J. N., & Flores-Cotera, L. B. (2011). Diversity of endophytic fungi of Taxus globosa (Mexican yew). Fungal Diversity, 47, 65–74. http://doi.org/10.1007/s13225-010-0045-1 Rojas, E. I., S. A. Rehner, G. J. Samuels, S. A. Van Bael, E. A. Herre, P. Cannon, R. Chen, J. Pang, R. Wang, Y. Zhang, Y.-Q. Peng, and T. Sha.(2010). Colletotrichum Gloeosporioides S.l. Associated with Theobroma Cacao and Other Plants in Panama: Multilocus Phylogenies Distinguish Host-associated Pathogens from Asymptomatic Endophytes. Mycologia 102(6), 1318-338. Rosa, L. H., Almeida Vieira, M. D. L., Santiago, I. F., & Rosa, C. A. (2010). Endophytic fungi community associated with the dicotyledonous plant Colobanthus quitensis (Kunth) Bartl. (Caryophyllaceae) in Antarctica. FEMS 74

Microbiology Ecology, 73(1), 178–189. http://doi.org/10.1111/j.1574- 6941.2010.00872. Saikkonen, K., Faeth, S. H., Helander M. and Sullivan T. J. (1998). Fungal Endophytes: A Continuum of Interactions with Host Plants. Annual Review of Ecology and Systematics, 29(1998), 319–343. Saucedo-García, A., Anaya, A. L., Espinosa-García, F. J., & González, M. C. (2014). Diversity and Communities of Foliar Endophytic Fungi from Different Agroecosystems of Coffea arabica L. in Two Regions of Veracruz, Mexico. PLoS ONE, 9(6), 1–11. article. http://doi.org/10.1371/journal.pone.0098454 Schmidt, P. A., Bálint, M., Greshake, B., Bandow, C., Römbke, J., & Schmitt, I. (2013). Illumina metabarcoding of a soil fungal community. Soil Biology and Biochemistry, 65, 128–132. http://doi.org/10.1016/j.soilbio.2013.05.014 Schoch, C. L., Sung, G. H., López-Giráldez, F., Townsend, J. P., Miadlikowska, J., Hofstetter, V.,Spatafora, J. W. (2009). The ascomycota tree of life: A phylum-wide phylogeny clari(es the origin and evolution of fundamental reproductive and ecological traits. Systematic Biology, 58(2), 224–239. http://doi.org/10.1093/sysbio/syp020 Stamatakis, A. (2014). RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30(9), 1312–1313. http://doi.org/10.1093/bioinformatics/btu033 Thomas, D. C., Vandegrift, R., Ludden, A., Carroll, G. C., & Roy, B. A. (2016). Spatial Ecology of the Fungal Genus Xylaria in a Tropical Cloud Forest. Biotropica, 0(0), 1–13. http://doi.org/10.1111/btp.12273 U’Ren, J. M., Lutzoni, F., Miadlikowska, J., & Arnold, A. E. (2010). Community Analysis Reveals Close Af(nities Between Endophytic and Endolichenic 75

Fungi in Mosses and Lichens. Microbial Ecology, 60(2), 340–353. http://doi.org/10.1007/s00248-010-9698-2 U’Ren, J. M., Lutzoni, F., Miadlikowska, J., Laetsch, A. D., & Arnold, E. (2012). Host and geographic structure of endophytic and endolichenic fungi at a continental scale. American Journal of Botany, 99(5), 898–914. http://doi.org/10.3732/ajb.1100459 U’Ren, J. M., Miadlikowska, J., Zimmerman, N. B., Lutzoni, F., Stajich, J. E., & Arnold, A. E. (2016). Contributions of North American endophytes to the phylogeny, ecology, and taxonomy of Xylariaceae (Sordariomycetes, Ascomycota). Molecular Phylogenetics and Evolution, 98, 210–232. http://doi.org/10.1016/j.ympev.2016.02.010 U’Ren, J. M., Riddle, J. M., Monacell, J. T., Carbone, I., Miadlikowska, J., & Arnold, A. E. (2014). Tissue storage and primer selection in:uence pyrosequencing-based inferences of diversity and community composition of endolichenic and endophytic fungi. Molecular Ecology Resources, 14(5), 1032–1048. http://doi.org/10.1111/1755-0998.12252 Van Bael, S. A, Fernández-Marín, H., Valencia, M. C., Rojas, E. I., Wcislo, W. T., & Herre, E. A. (2009). Two fungal symbioses collide: endophytic fungi are not welcome in leaf-cutting ant gardens. Proceedings. Biological Sciences / The Royal Society, 276(1666), 2419–2426. http://doi.org/10.1098/rspb.2009.0196 Vandruff, S. M. (2014). Variation of subterranean fungal community structure across substrate age and elevation in native forests on Hawai’i Island, USA (Masters Thesis). University Of Hawai’i At Hilo. Vega, F. E., Simpkins, A., Aime, M. C., Posada, F., Peterson, S. W., Rehner, S. A., Arnold, A. E. (2010). Fungal endophyte diversity in coffee plants from 76

Colombia, Hawai’i, Mexico and Puerto Rico. Fungal Ecology, 3(3), 122–138. http://doi.org/10.1016/j.funeco.2009.07.002 Vilgalys, R., & Hester, M. (1990). Rapid Genetic Identi(cation and Mapping of Enzymatically Ampli(ed Ribosomal DNA from Several Cryptococcus Species, 172(8). Vincent, J. B., Weiblen, G. D., & May, G. (2016). Host associations and beta diversity of fungal endophyte communities in New Guinea rainforest trees. Molecular Ecology, 25(3), 825–841. http://doi.org/10.1111/mec.13510 von Arx, J. (n.d.). A, 1957. Die Arten der Gattung Colletotrichum Corda. Phytopathol. Z, 29, 168–413. article. White, T. J., Bruns, T., Lee, S., Taylor, J. W., & others. (1990). Ampli(cation and direct sequencing of fungal ribosomal RNA genes for phylogenetics. PCR Protocols: A Guide to Methods and Applications, 18(1), 315–322. article. Wickham, H. (2009). ggplot2: Elegant Graphics for Data Analysis. book, Springer-Verlag New York. Retrieved from http://ggplot2.org Wickham, H. (2016). tidyr: Easily Tidy Data with `spread()` and `gather()` Functions. manual. Retrieved from https://cran.r-project.org/package=tidyr Wickham, H., & Francois, R. (2016). dplyr: A Grammar of Data Manipulation. manual. Retrieved from https://cran.r-project.org/package=dplyr Yu, G., Smith, D., Zhu, H., Guan, Y., & Lam, T. T.-Y. (n.d.). ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution. Article. Zhang, J., Kobert, K., Flouri, T., & Stamatakis, A. (2014). PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics, 30(5), 614–620. JOUR. http://doi.org/10.1093/bioinformatics/btt593 77

Zimmerman, N. B., & Vitousek, P. M. (2012). Fungal endophyte communities re:ect environmental structuring across a Hawaiian landscape. Proceedings of the National Academy of Sciences, 109(32), 13022–13027. http://doi.org/10.1073/pnas.1209872109 78

Appendix 1: Sample Metadata

Table 3. Top species identi(ed by QIIME taxanomic assignment of Illumina ITS sequences. genus Reads OTUs g__unidenti(ed 108796 399 g__Colletotrichum 82398 165 g__Pseudocercospor a 62214 31 g__Byssochlamys 45074 20 g__Mycosphaerella 32510 86 NA 28388 36 g__Stenella 23328 13 g__Derxomyces 23080 17 g__Rachicladosporiu m 21664 9 g__Ramichloridium 10424 23 g__Phyllosticta 8911 10 g__Strelitziana 7311 1 g__Lophiostoma 6204 4 g__Capnobotryella 3984 2 g__Pestalotiopsis 3665 10 79

Table 4. Top species identi(ed by QIIME taxonomic assignment of cultured isolate ITS sequences. Assigned Taxonomy Isolates OTUs s__Colletotrichum_gloeosporioides 20 6 s__Neofusicoccum_parvum 10 2 s__unculturedfungus 8 6 s__Phomopsis_sp_H1 7 3 s__Annulohypoxylon_stygium 6 2 s__Glomerella_cingulata 6 2 s__Glomerella_acutata 5 2 s__Pestalotiopsis_sp_SD01ES 5 2 s__unculturedAscomycota 5 3 s__fungal_endophyte_sp_g83 4 2 s__Fusarium_solani_f_batatas 4 1 s__Diaporthe_cynaroidis 3 1 s__Hypocreales_sp_HLS104 3 1 s__Cladosporium_sp_TMS_2011 1 1 s__Diaporthe_leucospermi 1 1 s__fungal_endophyte 1 1 s__fungal_endophyte_sp_g68 1 1 s__fungal_endophyte_sp_ICMP_15 986 1 1 s__fungal_endophyte_sp_ICMP_16 021 1 1 s__fungal_sp_ARIZ_L192 1 1 s__Gibberella_sp_UFMGCB_536 1 1 s__Pestalotiopsis_sp_14JAES 1 1 s__Pestalotiopsis_sp_34V 1 1 s__Pestalotiopsis_sp_CHTAM50 1 1 s__Phomopsis_asparagi 1 1 s__unculturedzygomycete 1 1 80

Table 5. Collection metadata for samples used for fungal culturing.

Elevatio Sample Species Date Site Latitude Longitude n chamissoni BI190 ana 3/24/15Along Volcano Rd 19.44781302 -155.201946 969 chamissoni BI191 ana 3/24/15Along Volcano Rd 19.448024 -155.201526 969 chamissoni Corner of Pearl Ave. BI192 ana 3/24/15and 4th St 19.42913697 -155.223696 1106 chamissoni Corner of Ruby Ave BI193 ana 3/24/15and 6th St. 19.42887101 -155.219743 1097 chamissoni BI226 ana 3/25/15Kimball property 19.61577496 -155.930943 686 chamissoni South side of BI245 ana 3/25/15Volcano Rd 19.42279397 -155.240994 1161 gaudichau K101 diana 6/19/15Kanaele Bog 21.97092203 -159.508411 640 gaudichau K112 diana 6/19/15Kanaele Bog 21.98047799 -159.50375 649 gaudichau K116 diana 6/19/15Kanaele Bog 21.98283599 -159.500682 658 K036 glabra 6/15/15Koke'e State Park 22.148279 -159.636252 1234 Na Pali-Kona Forest K055 glabra 6/16/15Reserve 22.14991498 -159.622635 1236 guadichau Waimea Canyon K006 dii 6/15/15State Park 22.03424899 -159.66943 713 guadichau Waimea Canyon K007 dii 6/15/15State Park 22.03424899 -159.66943 713 BI181 kilaueae 3/22/15Hilina Pali Rd. 19.338455 -155.274434 962 BI184 kilaueae 3/22/15Hilina Pali Rd. 19.33876798 -155.27426 969 BI185 kilaueae 3/22/15Hilina Pali Rd. 19.33935002 -155.27419 969 BI186 kilaueae 3/22/15Hilina Pali Rd. 19.33921902 -155.274021 969 BI187 kilaueae 3/22/15Hilina Pali Rd. 19.34040899 -155.274379 969 BI188 kilaueae 3/22/15Hilina Pali Rd. 19.34648403 -155.264303 988 BI189 kilaueae 3/22/15Hilina Pali Rd. 19.36365997 -155.250105 1015 EKH28 mollis 9/22/15Ekahanui 21.43842299 -158.096179 814 K115 mollis 6/19/15Kanaele Bog 21.98283599 -159.500682 658 SP10 mollis 8/28/15Skeeter Pass 21.51188803 -158.136377 995 K009 procera 6/15/15Koke'e State Park 22.13759802 -159.653476 1234 K021 procera 6/15/15Koke'e State Park 22.15062099 -159.644267 1253 Na Pali-Kona Forest K048 procera 6/16/15Reserve 22.14830498 -159.629142 1259 BI092 taccada 7/24/14Malamaki 19.44483066 -154.855712 18 BI172 taccada 3/21/15Chain of Craters Rd 19.29312903 -155.101665 9 BI173 taccada 3/21/15Chain of Craters Rd 19.29334796 -155.101028 9 K070 taccada 6/17/15Kamala Point 21.89280404 -159.410451 2 81

K078 taccada 6/17/15Kamala Poin 21.89348398 -159.405066 9 K097 taccada 6/18/15Secret Beach 22.22209197 -159.417565 2 K122 taccada 6/20/15Puu Ka Pele 22.09730797 -159.745284 3 Maka13 taccada 7/21/15Makapuu 21.31622 -157.66309 17 Maka2 taccada 7/21/15Sandy Beach 21.2889 -157.66639 14

Table 6. Collection metadata for samples used for Illumina sequencing.

Sample Species Reads Collection_Date Island Elevation K.0101. gaudichaudian P4 a 1835 6/19/15 K 640 chamissonian BI.245 a 1967 3/25/15 BI 1161 K.0009 procera 6955 6/15/15 K 1234 K.0115 mollis 9430 6/19/15 K 658 K.0048 procera 12879 6/16/15 K 1259 K.0006 gaudichaudii 15564 6/15/15 K 713 K.0078 taccada 19655 6/17/15 K 9 K.0055 glabra 24361 6/16/15 K 1236 chamissonian BI.190 a 32012 3/24/15 BI 969 K.0036 glabra 46158 6/15/15 K 1234 814.13659 EKH28 mollis 47350 9/22/15 O 7 K.0007 gaudichaudii 62338 6/15/15 K 713 BI.092 taccada 70212 7/24/14 BI 18 995.02563 SP10 mollis 77780 8/28/15 O 5 chamissonian BI.226 a 84189 3/25/15 BI 686 82

Table 7. Unique barcodes used to amplify samples during Illumina sequencing.

Sample ITS1F_Barcode ITS2_Barcode NEW_ITS2_BC K.0101.P CATGACTGCAT TGGTTGGTTAC CATACGAGATTGGTTGGTTAC 4 A G G GAGGCTAAGC GCGTTGCAAAC CATACGAGATGCGTTGCAAAC BI.245 CT T T CTAGCTCTCTA CCTGCTTCCTT CCTGCTTCCTTCAGTCAGTCA K.0009 T C G GAGCAGAGTA ATTAAGCCTGG CATACGAGATATTAAGCCTGG K.0115 GA A A GAGCAGAGTA CAACTCCCGTG CATACGAGATCAACTCCCGTG K.0048 GA A A CTAGCTCTCTA AGGCTTACGTG CATACGAGATAGGCTTACGTG K.0006 T T T GCGATAGATCG TTGGCTCTATT K.0078 C C CATACGAGATTTGGCTCTATTC CTAGCTCTCTA CTGTCAGTGAC CATACGAGATCTGTCAGTGAC K.0055 T C C GAGCAGAGTA GAGCCATCTGT CATACGAGATGAGCCATCTGT BI.190 GA A A CATGACTGCAT TCACCTCCTTG CATACGAGATTCACCTCCTTG K.0036 A T T GAGGCTAAGC TGGAGTAGGTG CATACGAGATTGGAGTAGGTG EKH28 CT G G GAGCAGAGTA GTTCTCTTCTC CATACGAGATGTTCTCTTCTC K.0007 GA G G GCGATAGATCG GAACACTTTGG CATACGAGATGAACACTTTGG BI.092 C A A GAGGCTAAGC GATGCTGCCGT CATACGAGATGATGCTGCCGT SP10 CT T T CATGACTGCAT AGCATGTCCCG CATACGAGATAGCATGTCCCG BI.226 A T T 83

Table 8. Diversity, evolutionary lineage, and environmental data used for multivariate ordination of Illumina sequencing data.

Host Host Shannon Species Sample Lineage MAP MAT Diversity gaudichaudian K.0101.P Lineage 3563.58911 a 4 C 1 17.0136699 1.00138547 chamissonian Lineage 2937.11010 a BI.245 C 7 13.8710002 1.48969043 Lineage 1579.59094 procera K.0009 C 2 13.5705003 3.29928527 Lineage 4041.50195 mollis K.0115 C 3 16.6591701 2.70942353 Lineage 2119.19506 procera K.0048 C 8 13.5263299 1.84750740 Lineage 823.767578 gaudichaudii K.0006 C 1 17.1096706 1.44851126 Lineage 1025.99804 taccada K.0078 B 7 21.6252498 0.15469858 Lineage 2336.84912 glabra K.0055 A 1 14.3746700 2.27571990 chamissonian Lineage 4589.44384 a BI.190 C 8 14.9185800 1.93919150 Lineage 1777.47497 glabra K.0036 A 6 13.4975795 1.18382326 Lineage 1217.60705 mollis EKH28 C 6 16.9367504 3.21016061 Lineage 823.767578 gaudichaudii K.0007 C 1 17.1096706 2.97627384 Lineage 2345.29809 taccada BI.092 B 6 21.4791698 1.48024529 Lineage 1684.32495 mollis SP10 C 1 15.2994203 1.67378841 chamissonian Lineage a BI.226 C 1185.45105 17.0869998 3.28668592 84

Appendix 2: Bioinformatics Pipeline for Processing Illumina Data

FEF_Survey_ITS/data ########## ###RUN1### ########## ./FEF_Survey_ITS/(rst_run ######### #RawData# ######### ./FEF_Survey/raw_data/(rst_run #SHA-1 $ shasum Undertermined_S0_L001_R*_001.fastq R1: 4be9bc5271ad23073b1b4c165aaf37d1b5a34fdf Undetermined_S0_L001_R1_001.fastq R2: d999eed7bdb855ced4c58812caaa9e35398cc761 Undetermined_S0_L001_R2_001.fastq $ shasum Undertermined_S0_L001_I*_001.fastq I1: I2:

########## ## PEAR ## ##v0.9.6## ########## $ pear -f ./FEF_Survey/(rst_run/data/seqs/raw_data/Undetermined_S0_L001_R1_00 1.fastq -r ./FEF_Survey/(rst_run/data/seqs/raw_data/Undetermined_S0_L001_R2_00 1.fastq -o out -m 550 -n 75 -y 8G PEAR v0.9.6 [January 15, 2015] Citation - PEAR: a fast and accurate Illumina Paired-End reAd mergeR Zhang et al (2014) Bioinformatics 30(5): 614-620 | doi:10.1093/bioinformatics/btt593 Forward reads (le...... : Undetermined_S0_L001_R1_001.fastq Reverse reads (le...... : Undetermined_S0_L001_R2_001.fastq PHRED...... : 33 Using empirical frequencies...... : YES Statistical method...... : OES 85

Maximum assembly length...... : 550 Minimum assembly length...... : 75 p-value...... : 0.010000 Quality score threshold (trimming).: 0 Minimum read size after trimming...: 1 Maximal ratio of uncalled bases....: 1.000000 Minimum overlap...... : 10 Scoring method...... : Scaled score Threads...... : 1 Allocating memory...... : 8,589,934,592 bytes Computing empirical frequencies....: DONE A: 0.224484 C: 0.264752 G: 0.255886 T: 0.254878 41437 uncalled bases Assemblying reads: 100% Assembled reads ...... : 13,934,262 / 14,868,974 (93.714%) Discarded reads ...... : 0 / 14,868,974 (0.000%) Not assembled reads ...... : 934,712 / 14,868,974 (6.286%) Assembled reads (le...... : out.assembled.fastq Discarded reads (le...... : out.discarded.fastq Unassembled forward reads (le.....: out.unassembled.forward.fastq Unassembled reverse reads (le.....: out.unassembled.reverse.fastq #SHA-1 out.assembled.fastq $ shasum out.assembled.fastq 44ca761beed9261766f43b034899bef3b2ef6a88 $ mkdir ./FEF_Survey/(rst_run/data/seqs/pear $ mv out.assembled.fastq ./FEF_Survey/(rst_run/data/seqs/pear $ mv out.discarded.fastq ./FEF_Survey/(rst_run/data/seqs/pear $ mv out.unassembled.forward.fastq ./FEF_Survey/(rst_run/data/seqs/pear $ mv out.unassembled.reverse.fastq ./FEF_Survey/(rst_run/data/seqs/pear

############ ## FASTQC ## ##v0.11.04## ############ Assembled Reads: $ fastqc out.assembled.fastq Total Sequences 13934262 Sequences :agged as poor quality 0 Sequence length 75-490 %GC 51 86

############### ## QIIME ## ## v1.9.1 ## ##Demultiplex## ############### $ macqiime $ validate_mapping_(le.py -m ./FEF_Survey/(rst_run/data/mapping_survey.txt $ extract_barcodes.py --input_type barcode_paired_end -f ./FEF_Survey/(rst_run/data/seqs/raw_data/index1.fastq -r ./FEF_Survey/(rst_run/data/seqs/raw_data/index2.fastq --bc1_len 12 --bc2_len 12 -o ./FEF_Survey/(rst_run/data/seqs/parsed_barcodes/

#gives back three (les: barcodes.fastq, reads1.fastq, reads2.fastq #barcodes.fastq is used for downstream steps $ (lter_fasta.py -f ./FEF_Survey/(rst_run/data/seqs/parsed_barcodes/barcodes.fastq -o ./FEF_Survey/(rst_run/data/seqs/split_libraries/barcodes_(ltered.fastq -a ./FEF_Survey/(rst_run/data/seqs/split_libraries/out.unassembled.forward.fna -n $ split_libraries_fastq.py -q 19 -i ./FEF_Survey/(rst_run/data/seqs/pear/out.assembled.fastq -o ./FEF_Survey/(rst_run/data/seqs/split_libraries/ -m ./FEF_Survey/(rst_run/data/mapping_survey.txt -b ./FEF_Survey/(rst_run/data/seqs/parsed_barcodes/barcodes_(ltered.fastq --store_demultiplexed_fastq --barcode_type 24 #SHA-1 $ shasum seqs.fna 1bb9b4ee015efc10ffbf1758b0197c92c318eff9 seqs.fna

########## ###Run2### ########## ./FEF_Survey/second_run ######### #RawData# ######### ./FEF_Survey/raw_data/second_run SHA-1 #sequence $ shasum Undertermined_S0_L001_R*_001.fastq 87

R1: aa1b970750d6698500c4d132b68a4b63dcb5a07c Undetermined_S0_L001_R1_001.fastq R2: 24895c881b7743874a02a1ce4c12e9edee98120b Undetermined_S0_L001_R2_001.fastq #index $ shasum Undertermined_S0_L001_I*_001.fastq I1: 8ab9f7aec49abccc7462dccf68ce38932dad687c Undetermined_S0_L001_I1_001.fastq I2: 288a0645276b118e5c962a23a5b9c4e86f39775a Undetermined_S0_L001_I2_001.fastq

######## ##PEAR## ########

PEAR v0.9.8 [April 9, 2015] Citation - PEAR: a fast and accurate Illumina Paired-End reAd mergeR Zhang et al (2014) Bioinformatics 30(5): 614-620 | doi:10.1093/bioinformatics/btt593 Forward reads (le...... : /home/gmcobian/lus/Projects/FEF_Survey/data/seqs/raw_data/second_run/ R1.fastq Reverse reads (le...... : /home/gmcobian/lus/Projects/FEF_Survey/data/seqs/raw_data/second_run/ R2.fastq PHRED...... : 33 Using empirical frequencies...... : YES Statistical method...... : OES Maximum assembly length...... : 550 Minimum assembly length...... : 75 p-value...... : 0.010000 Quality score threshold (trimming).: 0 Minimum read size after trimming...: 1 Maximal ratio of uncalled bases....: 1.000000 Minimum overlap...... : 10 Scoring method...... : Scaled score Threads...... : 1 Allocating memory...... : 8,589,934,592 bytes Computing empirical frequencies....: DONE A: 0.221524 C: 0.267700 G: 0.254290 T: 0.256485 232992 uncalled bases Assemblying reads: 0% Assemblying reads: 18% 88

Assemblying reads: 37% Assemblying reads: 56% Assemblying reads: 75% Assemblying reads: 94% Assemblying reads: 100% Assembled reads ...... : 17,394,585 / 18,888,478 (92.091%) Discarded reads ...... : 0 / 18,888,478 (0.000%) Not assembled reads ...... : 1,493,893 / 18,888,478 (7.909%) Assembled reads (le...... : out.assembled.fastq Discarded reads (le...... : out.discarded.fastq Unassembled forward reads (le.....: out.unassembled.forward.fastq Unassembled reverse reads (le.....: out.unassembled.reverse.fastq #SHA-1 $ shasum ./second_run/data/seqs/split_libraries/seqs.fna c96deacbd56be9e133363314549becb8ea389239 out.assembled.fastq $ mkdir ./FEF_Survey/second_run/data/seqs/pear $ mv out.assembled.fastq ./FEF_Survey/second_run/data/seqs/pear $ mv out.discarded.fastq ./FEF_Survey/second_run/data/seqs/pear $ mv out.unassembled.forward.fastq ./FEF_Survey/second_run/data/seqs/pear $ mv out.unassembled.reverse.fastq ./FEF_Survey/second_run/data/seqs/pear

############### ## QIIME ## ## v1.9.1 ## ##Demultiplex## ############### $ macqiime $ validate_mapping_(le.py -m ./FEF_Survey/data/22bp_mapping_survey.txt $ extract_barcodes.py --input_type barcode_paired_end -f ./FEF_Survey_ITS/raw_data/second_run/index1.fastq -r ./FEF_Survey_ITS/raw_data/second_run/index2.fastq --bc1_len 22 --bc2_len 12 -o ./FEF_Survey_ITS/second_run/data/seqs/parsed_barcodes/ $ convert_fastaqual_fastq.py -f ./FEF_Survey_ITS/second_run/data/seqs/pear/out.unassembled.forward.fas tq -c fastq_to_fastaqual -o ./FEF_Survey_ITS/second_run/data/seqs/pear/ $ (lter_fasta.py -f ./FEF_Survey_ITS/second_run/data/seqs/parsed_barcodes/barcodes.fastq -o ./FEF_Survey_ITS/second_run/data/seqs/parsed_barcodes/barcodes_(ltere d.fastq -a ./FEF_Survey_ITS/second_run/data/seqs/pear/out.unassembled.forward.fna 89

-n $ split_libraries_fastq.py -q 19 -i ./FEF_Survey/second_run/data/seqs/pear/out.assembled.fastq -o ./FEF_Survey/second_run/data/seqs/split_libraries/ -m ./FEF_Survey/second_run/data/22bp_mapping_survey.txt -b ./FEF_Survey/second_run/data/seqs/parsed_barcodes/barcodes_(ltered.fas tq --store_demultiplexed_fastq --barcode_type 24 #SHA-1 $ shasum seqs.fna

####################### ###Combine Run1 and Run2### ####################### ./FEF_Survey_ITS $ cat ./(rst_run/data/seqs/split_libraries/seqs.fna ./second_run/data/seqs/split_libraries/seqs.fna > ./data/seqs/combined_seqs.fna $ shasum ./data/seqs/combined_seqs.fna 1bb9b4ee015efc10ffbf1758b0197c92c318eff9 seqs.fna

################### ##Remove Chimeras## ## vsearch ## ## v1.9.10 ## ################### $ vsearch -uchime_ref ./FEF_Survey/data/seqs/combined_seqs.fna -db ./ITS_Refs/uchime_sh_refs_dynamic_develop_985_01.01.2016.ITS1.fasta -strand plus -nonchimeras ./FEF_Survey/data/seqs/remove_chimeras/no_chimeras_combined_seqs.fn a -threads 20 $ shasum no_chimeras_combined_seqs.fna 12267c2b5c35423361e46e1c96854a096684cfed no_chimeras_combined_seqs.fna

############### ###Open Reference# ### OTU Picking ## ############### $ macqiime $ pick_open_reference_otus.py -i 90

./FEF_Survey/data/seqs/remove_chimeras/no_chimeras_combined_seqs.fn a -o ./FEF_Survey/data/OTUs/open_ref_sup4 -r ./ITS_Refs/sh_refs_qiime_ver7_97_s_31.01.2016_w_outgroup.fasta -s 0.01 --suppress_align_and_tree --suppress_step4 -p ./FEF_Survey/jobs/OTU_params_97_w_outgroup.txt -a -f -O 20 -m sortmerna_sumaclust #parameter (le content ./FEF_Survey/jobs/OTU_params_97_w_outgroup.txt assign_taxonomy:id_to_taxonomy_fp ./ITS_Refs/sh_taxonomy_qiime_ver7_dynamic_s_31.01.2016_w_outgroup.t xt assign_taxonomy:reference_seqs_fp ./ITS_Refs/sh_refs_qiime_ver7_dynamic_s_31.01.2016_w_outgroup.fasta assign_taxonomy:assignment_method blast #open reference OTU picking summary $ macqiime $ biom summarize-table -i ./FEF_Survey_ITS/data/OTUs/open_ref_sup4/otu_table_mc2_w_tax.biom > ./FEF_Survey_ITS/data/OTUs/open_ref_sup4/otu_summary.txt $ biom add-metadata -i ./FEF_Survey_ITS/data/OTUs/open_ref_sup4/otu_table_mc2_w_tax.biom -m ./FEF_Survey_ITS/data/22bp_mapping_survey_w_metadata.txt -o ./FEF_Survey_ITS/data/OTUs/open_ref_sup4/otu_table_mc2_w_tax_w_met adata.biom

###################### ###Import OTU Data into R### ########## RStudio ####### ######### v0.99.893 ###### ########### R ########## ########## v3.2.4 ######## ####################### ##FEF_Survey_ITS_OTU_Data ## ################# ### Open Packages ### ################# library(biomformat) ######################### ### import data and clean it up ### ######################### #upload biom (le into R data_biom <- 91

read_biom("/Users/Gerry/Dropbox/FEF_Survey_ITS/data/OTUs/open_ref_s up4/otu_table_mc2_w_tax_w_metadata.biom") #Warning message: #In strsplit(msg, "\n") : input string 1 is invalid in this locale #convert biom (le into matrix then into data frame its_dat <- as.matrix(biom_data(data_biom)) its_dat<- as.data.frame(its_dat) #make data frame of taxonomy metadata all_taxonomy <- observation_metadata(data_biom) all_taxonomy <- as.data.frame(all_taxonomy) colnames(all_taxonomy) <- c("kingdom", "phyla", "class", "order", "family", "genus", "species") #make data frame of sample metadata metadata <- sample_metadata(data_biom) metadata <- as.data.frame(metadata) #remove OTUs that are not fungi #this can be done later if you want to check for plant OTUs (rst f_taxonomy <- subset(all_taxonomy, all_taxonomy[,1] == "k__Fungi") #all_taxonomy: 5505 observations, f_taxonomy: 5139 observations, 366 observation difference #make data look like taxonomy f_its_dat <- its_dat[rownames(f_taxonomy),] #transpose data tf_its_dat <- t(its_dat) #check metadata (le matches data (le all.equal(rownames(tf_its_dat), rownames(metadata)) ###[1] TRUE 92

Appendix 3: Data Analysis in R Importing Data and Plotting Summary Barplot: library(vegan) ## Loading required package: permute ## Loading required package: lattice ## This is vegan 2.4-0 library(raster) ## Loading required package: sp library(rasterVis) ## Loading required package: latticeExtra ## Loading required package: RColorBrewer library(rgdal) ## rgdal: version: 1.1-10, (SVN revision 622) ## Geospatial Data Abstraction Library extensions to R successfully loaded ## Loaded GDAL runtime: GDAL 1.11.4, released 2016/01/25 ## Path to GDAL shared Gles: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/rgdal/gdal ## Loaded PROJ.4 runtime: Rel. 4.9.1, 04 March 2015, [PJ_VERSION: 491] ## Path to PROJ.4 shared Gles: /Library/Frameworks/R.framework/Versions/3.3/Resources/library/rgdal/proj ## Linking to sp version: 1.2-3 library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:raster': ## ## intersect, select, union ## The following objects are masked from 'package:stats': ## ## Glter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union library(ggplot2) ## ## Attaching package: 'ggplot2' ## The following object is masked from 'package:latticeExtra': ## ## layer 93 library(tidyr) ## ## Attaching package: 'tidyr' ## The following object is masked from 'package:raster': ## ## extract #OTU table read.csv("clean_otus", stringsAsFactors = F, row.names = 1)->clean_otus #rarefy to minimum sequencing depth clean_t_otus <- as.data.frame(t(clean_otus)) clean_min_depth = min(colSums(clean_t_otus)) otus_rareGed <- as.data.frame(round(rrarefy(clean_otus, clean_min_depth))) #SampleMetadata read.csv("clean_meta", stringsAsFactors = F)->clean_meta row.names(clean_meta)<-clean_meta$Sample #PlantTraits read.csv("PlantTraits.csv", stringsAsFactors = F) -> PlantTraits PlantTraits$X -> row.names(PlantTraits) PlantTraits %>% select(-X) -> PlantTraits as.data.frame(t(PlantTraits)) -> PlantTraits gsub("S\\.\\.","", row.names(PlantTraits)) -> row.names(PlantTraits) gsub("coriaceab","coriacea", row.names(PlantTraits))-> row.names(PlantTraits) mutate(PlantTraits, Species = row.names(PlantTraits)) -> PlantTraits left_join(clean_meta, PlantTraits, by = "Species")-> clean_meta_Traits #Bar graphs of fungal families recovered from each Scaevola individual #library(dplyr) #organize data into long format (preferred by ggplot) read.csv("taxonomy", stringsAsFactors = F)-> taxonomy taxonomy %>% rename(OTU = X)-> taxonomy clean_t_otus %>% mutate(OTU = row.names(clean_t_otus)) -> Scaev_w_tax left_join(Scaev_w_tax, taxonomy) -> Scaev_taxonomy ## Joining, by = "OTU" Scaev_taxonomy %>% gather(Sample, Reads, 1:15) -> long_Scaev_taxonomy #host meta data and fungal taxonomy in one big table for ggplot left_join(long_Scaev_taxonomy, clean_meta) -> Scaev_Data ## Joining, by = "Sample" as.vector(Scaev_Data$Species) -> Scaev_Data$Species Scaev_Data$Species[Scaev_Data$Species == "guadichaudii"]<- "gaudichaudii" Scaev_Data %>% mutate(PA = ifelse(Reads >0,1,0)) -> Scaev_Data Scaev_Data %>% rename(Distinct_OTUs = PA) -> Scaev_Data Scaev_Data %>% arrange(class)-> Scaev_Data #plot data, must set geom_bar to stat = "identity" or else error colourCount = length(unique(Scaev_Data$class)) getPalette = colorRampPalette(brewer.pal(9, "RdYlBu")) p<-ggplot(data = Scaev_Data, aes(x = Species, y = Distinct_OTUs, Gll= class, cex.lab = 3)) + geom_bar(stat = "identity") + coord_Tip() + scale_Gll_manual(values = getPalette(colourCount)) p 94

#To get Plot attributes: #ggplot_build(p) #unique(g$data[[1]]["(ll"]) Importing GIS raster data and extracting values at speci(c coordinates: #Temp, Rain, Elevation import and factoring

#data downlowded from http://climate.geography.hawaii.edu/ rain=raster("/Users/seanswift/Dropbox/Scaevola_Illumina_R_Proj/Analysis/GIS/StateASCIIGrids_ mm/rfgrid_mm_state_ann.txt") temp=raster("/Users/seanswift/Dropbox/Scaevola_Illumina_R_Proj/Analysis/GIS/Tair_ann_hr_as cii/tair_ann_24.txt") #Convert to spatial points dataframe clean_meta_Traits$Latitude <- as.numeric(clean_meta_Traits$Latitude) clean_meta_Traits$Longitude<- as.numeric(clean_meta_Traits$Longitude) coordinates(clean_meta_Traits)=c("Longitude","Latitude") detach("package:tidyr", unload=TRUE) #extract point values from collection locations and add to metadata clean_meta_Traits$rain=extract(rain, clean_meta_Traits) clean_meta_Traits$temp=extract(temp, clean_meta_Traits) library(tidyr) ## ## Attaching package: 'tidyr' 95

## The following object is masked from 'package:raster': ## ## extract #Create levels for elevation and Mean Annual Temp as.data.frame(clean_meta_Traits) -> clean_meta_Traits clean_meta_Traits %>% mutate(ElevFactor = cut(clean_meta_Traits$Elevation, seq(0,1300,260), right=FALSE, dig.lab=4)) -> clean_meta_Traits_Env clean_meta_Traits_Env %>% mutate(TempFactor = cut(clean_meta_Traits_Env$temp, seq(13,22,0.5), right=FALSE)) -> clean_meta_Traits_Env clean_meta_Traits_Env$ElevFactor <- as.vector(clean_meta_Traits_Env$ElevFactor) clean_meta_Traits_Env$TempFactor <- as.vector(clean_meta_Traits_Env$TempFactor) as.factor(gsub("\\[(\\w+),(\\w+)\\)","\\1-\\2", clean_meta_Traits_Env$ElevFactor, perl = T))-> clean_meta_Traits_Env$ElevFactor as.factor(gsub(",","-", clean_meta_Traits_Env$TempFactor, perl = T))-> clean_meta_Traits_Env$TempFactor as.factor(gsub("\\[","", clean_meta_Traits_Env$TempFactor, perl = T))-> clean_meta_Traits_Env$TempFactor as.factor(gsub("\\)","", clean_meta_Traits_Env$TempFactor, perl = T))-> clean_meta_Traits_Env$TempFactor gsub("\\.","",colnames(clean_meta_Traits_Env))-> colnames(clean_meta_Traits_Env) arrange(clean_meta_Traits_Env, Sample)-> clean_meta_Traits_Env clean_meta_Traits_Env$Lineage <- gsub("Lineage ","", clean_meta_Traits_Env$Lineage) clean_meta_Traits_Env %>% select(-X) ->clean_meta_Traits_Env row.names(clean_meta_Traits_Env)<- clean_meta_Traits_Env$Sample clean_meta_Traits_Env[row.names(clean_otus),] ->clean_meta_Traits_Env clean_meta_Traits_Env$Sample == row.names(clean_otus) ## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [15] TRUE #count number of times each SampleID occured at each elevation level clean_meta_Traits_Env %>% Glter(TotalReads > 0) %>% group_by(Sample, TempFactor) %>% summarise(TempCount = n()) %>% spread(key = TempFactor, value = TempCount)- >clean_meta_Traits_Env_abund clean_meta_Traits_Env %>% Glter(TotalReads > 0) %>% group_by(Sample, ElevFactor) %>% summarise(ElevCount = n()) %>% spread(key = ElevFactor, value = ElevCount)%>% right_join(clean_meta_Traits_Env_abund) ->clean_meta_Traits_Env_abund ## Joining, by = "Sample" ################ ##Summary Data## ################ long_Scaev_taxonomy %>% group_by(genus) %>% summarise(Reads = sum(Reads), OTUs = n_distinct(OTU)) %>% arrange(desc(Reads), desc(OTUs)) %>% slice(1:15) ->top_genera top_genera ## # A tibble: 15 x 3 ## genus Reads OTUs ## ## 1 g__unidentiGed 108796 399 ## 2 g__Colletotrichum 82398 165 ## 3 g__Pseudocercospora 62214 31 ## 4 g__Byssochlamys 45074 20 ## 5 g__Mycosphaerella 32510 86 ## 6 28388 36 ## 7 g__Stenella 23328 13 96

## 8 g__Derxomyces 23080 17 ## 9 g__Rachicladosporium 21664 9 ## 10 g__Ramichloridium 10424 23 ## 11 g__Phyllosticta 8911 10 ## 12 g__Strelitziana 7311 1 ## 13 g__Lophiostoma 6204 4 ## 14 g__Capnobotryella 3984 2 ## 15 g__Pestalotiopsis 3665 10 long_Scaev_taxonomy %>% Glter(Reads>0)%>% group_by(OTU)%>% summarise(Samples=n_distinct(Sample)) -> SampleCount #p<-ggplot(data = SampleCount, aes(x=reorder(OTU, Samples), y = Samples)) + geom_bar(stat="identity") #p<- p+theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank()) p<- ggplot(SampleCount, aes(x=Samples)) p<- p+geom_histogram(binwidth = 1)+scale_x_continuous(limits = c(0,15), breaks = 1:15)+labs(title = "Endophyte OTU Occurance Across Samples") p

#Calculating Shannon Diversity and Bray-Curtis dissimilarity: ############################################## #Diversity calculation and adding to metadata# ############################################## 97 clean_meta_Traits_Env %>% mutate(ShanDiversity = diversity(clean_otus))-> clean_meta_Traits_Env clean_meta_Traits_Env$ShanDiversity <- as.numeric(clean_meta_Traits_Env$ShanDiversity) clean_meta_Traits_Env%>% group_by(ElevFactor) %>% summarise(mean(ShanDiversity)) ## # A tibble: 4 x 2 ## ElevFactor mean(ShanDiversity) ## ## 1 0-260 0.8174719 ## 2 1040-1300 2.0192053 ## 3 520-780 2.2844560 ## 4 780-1040 2.2743802 summary(aov(clean_meta_Traits_Env$ShanDiversity ~ clean_meta_Traits_Env$ElevFactor)) ## Df Sum Sq Mean Sq F value Pr(>F) ## clean_meta_Traits_Env$ElevFactor 3 3.429 1.1430 1.405 0.293 ## Residuals 11 8.948 0.8135 p<-plot(TukeyHSD(aov(clean_meta_Traits_Env$ShanDiversity ~ clean_meta_Traits_Env$ElevFactor)))

plot(TukeyHSD(aov(clean_meta_Traits_Env$ShanDiversity ~ clean_meta_Traits_Env$Lineage))) 98

p<-hist(clean_meta_Traits_Env$ShanDiversity) 99

p ## $breaks ## [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 ## ## $counts ## [1] 1 0 5 3 1 2 3 ## ## $density ## [1] 0.1333333 0.0000000 0.6666667 0.4000000 0.1333333 0.2666667 0.4000000 ## ## $mids ## [1] 0.25 0.75 1.25 1.75 2.25 2.75 3.25 ## ## $xname ## [1] "clean_meta_Traits_Env$ShanDiversity" ## ## $equidist ## [1] TRUE ## ## attr(,"class") ## [1] "histogram" #Clean colnames of trait metadata and convert to numeric gsub("\\.","",colnames(clean_meta_Traits_Env)) ->colnames(clean_meta_Traits_Env) gsub(" ","",colnames(clean_meta_Traits_Env)) ->colnames(clean_meta_Traits_Env) 100 lapply(clean_meta_Traits_Env[,23:49], as.numeric) -> clean_meta_Traits_Env[,23:49] lapply(clean_meta_Traits_Env[,51:56], as.numeric) -> clean_meta_Traits_Env[,51:56] #get rid of problematic/useless columns clean_meta_Traits_Env %>% select(-c(41:43)) -> clean_meta_Traits_Env #Distance Matrix sqrt_otus_rareGed = sqrt(otus_rareGed) rank.totus <- rankindex(as.matrix(sqrt_otus_rareGed), otus_rareGed, indices = c("bray", "euclid", "manhattan", "horn"), method = "spearman") print(paste("The highest rank was given by the", names(sort(rank.totus, decreasing = TRUE)[1]), "method.")) ## [1] "The highest rank was given by the bray method." (vegdist(otus_rareGed, "bray"))-> Bray_dist Bray_dist ## K.0101.P4 BI.245 K.0009 K.0115 K.0048 K.0006 ## BI.245 0.9907357 ## K.0009 0.9961853 0.9940054 ## K.0115 0.9673025 0.9831063 0.9803815 ## K.0048 0.9525886 0.8070845 0.9673025 0.9395095 ## K.0006 0.9967302 0.9972752 0.9896458 0.9950954 0.9950954 ## K.0078 0.9972752 0.9994550 0.9950954 0.9967302 1.0000000 1.0000000 ## K.0055 0.9820163 0.9787466 0.9602180 0.9204360 0.9792916 0.9885559 ## BI.190 0.9269755 0.9798365 0.9623978 0.9885559 0.9787466 0.9858311 ## K.0036 0.9978202 0.9972752 0.9847411 0.9847411 0.9978202 0.9989101 ## EKH28 0.9444142 0.9863760 0.9825613 0.9531335 0.9869210 0.9989101 ## K.0007 0.8877384 0.9901907 0.9716621 0.9782016 0.9760218 0.9264305 ## BI.092 0.9989101 0.9967302 0.9934605 0.9825613 0.9989101 1.0000000 ## SP10 0.9978202 0.9891008 0.9978202 0.9880109 0.9956403 1.0000000 ## BI.226 0.9019074 0.9863760 0.9836512 0.9378747 0.9885559 0.9956403 ## K.0078 K.0055 BI.190 K.0036 EKH28 K.0007 ## BI.245 ## K.0009 ## K.0115 ## K.0048 ## K.0006 ## K.0078 ## K.0055 0.9967302 ## BI.190 0.9989101 0.7400545 ## K.0036 0.9989101 0.8833787 0.9967302 ## EKH28 0.9847411 0.9226158 0.9776567 0.9738420 ## K.0007 0.9880109 0.9689373 0.9335150 0.9972752 0.8479564 ## BI.092 0.9520436 0.9722071 0.9983651 0.9841962 0.9743869 0.9814714 ## SP10 0.9983651 0.9961853 0.9940054 0.9961853 0.9961853 0.9983651 ## BI.226 0.9961853 0.9351499 0.9591281 0.9803815 0.9106267 0.8910082 ## BI.092 SP10 ## BI.245 ## K.0009 ## K.0115 ## K.0048 ## K.0006 ## K.0078 ## K.0055 ## BI.190 ## K.0036 ## EKH28 ## K.0007 ## BI.092 101

## SP10 0.9994550 ## BI.226 0.9782016 0.9929155 Calculating multivariate dispersion/Beta diversity: ################################ #Calculate dispersion of groups# ################################ clean_meta_Traits_Env$ElevFactor <- as.vector(clean_meta_Traits_Env$ElevFactor) clean_meta_Traits_Env$ElevFactor <- as.factor(clean_meta_Traits_Env$ElevFactor) betadisper(Bray_dist, clean_meta_Traits_Env$Lineage) -> mod betadisper(Bray_dist, clean_meta_Traits_Env$ElevFactor) -> mod2 betadisper(Bray_dist, clean_meta_Traits_Env$Species)-> mod3 ## Perform test anova(mod) ## Analysis of Variance Table ## ## Response: Distances ## Df Sum Sq Mean Sq F value Pr(>F) ## Groups 2 0.111263 0.055632 114.53 1.521e-08 *** ## Residuals 12 0.005829 0.000486 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(mod2) ## Analysis of Variance Table ## ## Response: Distances ## Df Sum Sq Mean Sq F value Pr(>F) ## Groups 3 0.026753 0.0089176 9.6424 0.002058 ** ## Residuals 11 0.010173 0.0009248 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(mod3) ## Analysis of Variance Table ## ## Response: Distances ## Df Sum Sq Mean Sq F value Pr(>F) ## Groups 6 0.275869 0.045978 192.09 3.35e-08 *** ## Residuals 8 0.001915 0.000239 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## Permutation test for F permutest(mod2, pairwise = TRUE, permutations = 99) ## ## Permutation test for homogeneity of multivariate dispersions ## Permutation: free ## Number of permutations: 99 ## ## Response: Distances ## Df Sum Sq Mean Sq F N.Perm Pr(>F) ## Groups 3 0.026753 0.0089176 9.6424 99 0.01 ** ## Residuals 11 0.010173 0.0009248 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## 102

## Pairwise comparisons: ## (Observed p-value below diagonal, permuted p-value above diagonal) ## 0-260 1040-1300 520-780 780-1040 ## 0-260 0.0100000 0.0100000 0.04 ## 1040-1300 0.0011872 0.7800000 0.13 ## 520-780 0.0116488 0.8055671 0.36 ## 780-1040 0.0016332 0.0840219 0.3188043 ## Tukey's Honest SigniGcant Differences (mod.HSD <- TukeyHSD(mod)) ## Tukey multiple comparisons of means ## 95% family-wise conGdence level ## ## Fit: aov(formula = distances ~ group, data = df) ## ## $group ## diff lwr upr p adj ## B-A 0.03433243 -0.02446479 0.09312964 0.3004116 ## C-A 0.21088982 0.16569203 0.25608760 0.0000001 ## C-B 0.17655739 0.13135961 0.22175517 0.0000006 mod.HSD ## Tukey multiple comparisons of means ## 95% family-wise conGdence level ## ## Fit: aov(formula = distances ~ group, data = df) ## ## $group ## diff lwr upr p adj ## B-A 0.03433243 -0.02446479 0.09312964 0.3004116 ## C-A 0.21088982 0.16569203 0.25608760 0.0000001 ## C-B 0.17655739 0.13135961 0.22175517 0.0000006 mod ## ## Homogeneity of multivariate dispersions ## ## Call: betadisper(d = Bray_dist, group = ## clean_meta_Traits_Env$Lineage) ## ## No. of Positive Eigenvalues: 14 ## No. of Negative Eigenvalues: 0 ## ## Average distance to median: ## A B C ## 0.4417 0.4760 0.6526 ## ## Eigenvalues for PCoA axes: ## PCoA1 PCoA2 PCoA3 PCoA4 PCoA5 PCoA6 PCoA7 PCoA8 ## 0.6871 0.6451 0.6128 0.5366 0.5136 0.5036 0.4874 0.4732 ## Plot the groups and distances to centroids on the ## Grst two PCoA axes plot(mod) 103

scores(mod) ## $sites ## PCoA1 PCoA2 ## K.0101.P4 0.12737451 0.239795133 ## BI.245 -0.43967204 -0.043596227 ## K.0009 -0.05283227 -0.091851554 ## K.0115 -0.04298475 -0.026012062 ## K.0048 -0.42542639 -0.013206327 ## K.0006 -0.02772913 0.112624445 ## K.0078 -0.10772531 0.061598996 ## K.0055 0.25609198 -0.429434524 ## BI.190 0.25197022 -0.330966868 ## K.0036 0.04923099 -0.254422911 ## EKH28 0.19958284 0.216631264 ## K.0007 0.21505507 0.335493106 ## BI.092 -0.05840676 0.023385773 ## SP10 -0.14112345 -0.003022965 ## BI.226 0.19659448 0.202984719 ## ## $centroids ## PCoA1 PCoA2 ## A 0.152661486 -0.34192872 ## B -0.083066035 0.04249238 ## C -0.009318898 0.05799931 104

## with data ellipses instead of hulls plot(mod, ellipse = TRUE, hull = F) # 1 sd data ellipse ## Warning in chol.default(cov, pivot = TRUE): the matrix is either rank- ## deGcient or indeGnite ## Warning in chol.default(cov, pivot = TRUE): the matrix is either rank- ## deGcient or indeGnite

plot(mod, ellipse = TRUE, hull = FALSE, conf = 0.95) # 90% data ellipse ## Warning in chol.default(cov, pivot = TRUE): the matrix is either rank- ## deGcient or indeGnite ## Warning in chol.default(cov, pivot = TRUE): the matrix is either rank- ## deGcient or indeGnite 105

## can also specify which axes to plot, ordering respected plot(mod, axes = c(3,1), seg.col = "forestgreen", seg.lty = "dashed") 106

## Draw a boxplot of the distances to centroid for each group boxplot(mod) 107

#Calculating and plotting rarefaction curves for OTU accumulation: #################### #Rarefaction Curves# #################### #By sample: clean_otus.S <- specnumber(clean_otus) clean_otus.raremax <- min(apply(clean_otus, 1, sum)) clean_otus.Srare <- rarefy(clean_otus, clean_otus.raremax) plot(clean_otus.S, clean_otus.Srare, xlab = "Obs No. Species", ylab = "RareGed No. Spec", abline(0, 1)) 108

rarecurve(clean_otus, step = 20, sample = clean_otus.raremax, col = "blue", cex = 1) 109

#By Elevation group: clean_otus %>% mutate(Elev = clean_meta_Traits_Env$ElevFactor) %>% group_by(Elev) %>% summarise_each(funs(sum)) -> elev_otus row.names(elev_otus) <- elev_otus$Elev ## Warning: Setting row names on a tibble is deprecated. elev_otus %>% select(-Elev)-> elev_otus elev_otus.S <- specnumber(elev_otus) elev_otus.raremax <- min(apply(elev_otus, 1, sum)) elev_otus.Srare <- rarefy(elev_otus, elev_otus.raremax) plot(elev_otus.S, elev_otus.Srare, xlab = "Obs No. Species", ylab = "RareGed No. Spec", abline(0, 1)) 110

rarecurve(elev_otus, step = 20, sample = elev_otus.raremax, col = "blue", cex = 1) 111

#By Host Species: clean_otus %>% mutate(Host = clean_meta_Traits_Env$Species) %>% group_by(Host) %>% summarise_each(funs(sum)) -> host_otus row.names(host_otus) <- host_otus$Host ## Warning: Setting row names on a tibble is deprecated. host_otus %>% select(-Host)-> host_otus host_otus.S <- specnumber(host_otus) host_otus.raremax <- min(apply(host_otus, 1, sum)) host_otus.Srare <- rarefy(host_otus, host_otus.raremax) plot(host_otus.S, host_otus.Srare, xlab = "Obs No. Species", ylab = "RareGed No. Spec", abline(0, 1)) 112

rarecurve(host_otus, step = 20, sample = host_otus.raremax, col = "blue", cex = 1) 113

#By Host Lineage: clean_otus %>% mutate(Lineage = clean_meta_Traits_Env$Lineage) %>% group_by(Lineage) %>% summarise_each(funs(sum)) -> Lineage_otus row.names(Lineage_otus) <- Lineage_otus$Lineage ## Warning: Setting row names on a tibble is deprecated. Lineage_otus %>% select(-Lineage)-> Lineage_otus Lineage_otus.S <- specnumber(Lineage_otus) Lineage_otus.raremax <- min(apply(Lineage_otus, 1, sum)) Lineage_otus.Srare <- rarefy(Lineage_otus, Lineage_otus.raremax) rarecurve(Lineage_otus, step = 20, sample = Lineage_otus.raremax, col = "blue", cex = 1) 114

#Total: clean_otus %>% summarise_each(funs(sum))-> total_otus total_otus.S <- specnumber(total_otus) total_otus.raremax <- min(apply(total_otus, 1, sum)) total_otus.Srare <- rarefy(total_otus, total_otus.raremax) plot(total_otus.S, total_otus.Srare, xlab = "Obs No. Species", ylab = "RareGed No. Spec", abline(0, 1)) 115

rarecurve(total_otus, step = 20, sample = total_otus.raremax, col = "blue", cex = 1) 116

#Calculating multivariate statistics explaining community composition: ################################################## #Multivariate Statistics of Community Composition# ################################################## #environmental adonis(Bray_dist ~ Lineage+temp, data=clean_meta_Traits_Env, permutations = 6000) ## ## Call: ## adonis(formula = Bray_dist ~ Lineage + temp, data = clean_meta_Traits_Env, permutations = 6000) ## ## Permutation: free ## Number of permutations: 6000 ## ## Terms added sequentially (Grst to last) ## ## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) ## Lineage 2 1.103 0.55150 1.2315 0.16622 0.0148309 * ## temp 1 0.607 0.60701 1.3555 0.09147 0.0008332 *** ## Residuals 11 4.926 0.44781 0.74231 ## Total 14 6.636 1.00000 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 117

#Host Traits adonis(Bray_dist ~ Height, data=clean_meta_Traits_Env, permutations = 999) ## ## Call: ## adonis(formula = Bray_dist ~ Height, data = clean_meta_Traits_Env, permutations = 999) ## ## Permutation: free ## Number of permutations: 999 ## ## Terms added sequentially (Grst to last) ## ## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) ## Height 1 0.5158 0.51582 1.0957 0.07773 0.178 ## Residuals 13 6.1202 0.47078 0.92227 ## Total 14 6.6360 1.00000 adonis(Bray_dist ~ Species +Island, data=clean_meta_Traits_Env, permutations = 999) ## ## Call: ## adonis(formula = Bray_dist ~ Species + Island, data = clean_meta_Traits_Env, permutations = 999) ## ## Permutation: free ## Number of permutations: 999 ## ## Terms added sequentially (Grst to last) ## ## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) ## Species 6 2.9856 0.49761 1.0919 0.44992 0.078 . ## Island 2 0.9160 0.45800 1.0050 0.13804 0.528 ## Residuals 6 2.7343 0.45572 0.41205 ## Total 14 6.6360 1.00000 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #Anova for diversity plot(TukeyHSD(aov(clean_meta_Traits_Env$ShanDiversity ~ clean_meta_Traits_Env$ElevFactor))) 118

## Uneven sequencing depth may have an impact readNumbers = apply(clean_otus,1,sum) mutate(clean_meta_Traits_Env, sqrtReads = sqrt(TotalReads))->clean_meta_Traits_Env ## The average read number of OTUs MeanCount=apply(clean_otus,2,function(vec) mean(vec[vec>0])) ## In how many samples is an OTU present? TotPresent = apply(clean_otus,2,function(vec) sum(vec>0)) ## The highest read number of an OTU in a sample MaxCount=apply(clean_otus,2,max) ## Plotting incidence against abundance plot(TotPresent, MaxCount, xlab="Incidence", ylab="Maximum Abundance", pch=20) 119

plot(TotPresent, log(MaxCount), xlab="Incidence", ylab="log(Maximum Abundance)", pch=20) 120

library(mvabund) library(mgcv) ## Loading required package: nlme ## ## Attaching package: 'nlme' ## The following object is masked from 'package:dplyr': ## ## collapse ## The following object is masked from 'package:raster': ## ## getData ## This is mgcv 1.8-13. For overview type 'help("mgcv-package")'. ## Create a smoothed trendline gam1 = gam(log(MaxCount)~s(TotPresent)) plot(gam1, residuals=T, shade=T, rug=F, cex=2.6, xlab="Incidence", ylab="logMean Abundance") # , xaxp=c(0,150,15) 121

#Non-Metic Multi-Demsional Scaling plot: ## RareGed Full community NMDS MDS.all <- metaMDS(Bray_dist) ## Run 0 stress 0.1733011 ## Run 1 stress 0.1704113 ## ... New best solution ## ... Procrustes: rmse 0.165 max resid 0.3482582 ## Run 2 stress 0.1926501 ## Run 3 stress 0.2398663 ## Run 4 stress 0.1704112 ## ... New best solution ## ... Procrustes: rmse 0.0001131668 max resid 0.0002271069 ## ... Similar to previous best ## Run 5 stress 0.1704111 ## ... New best solution ## ... Procrustes: rmse 4.480655e-05 max resid 8.282167e-05 ## ... Similar to previous best ## Run 6 stress 0.1731518 ## Run 7 stress 0.1731642 ## Run 8 stress 0.1978448 ## Run 9 stress 0.1646939 ## ... New best solution ## ... Procrustes: rmse 0.1382272 max resid 0.3456474 ## Run 10 stress 0.1903629 ## Run 11 stress 0.2658654 122

## Run 12 stress 0.1704126 ## Run 13 stress 0.1646927 ## ... New best solution ## ... Procrustes: rmse 0.0004397774 max resid 0.001123847 ## ... Similar to previous best ## Run 14 stress 0.2338114 ## Run 15 stress 0.1704172 ## Run 16 stress 0.2039919 ## Run 17 stress 0.2634836 ## Run 18 stress 0.1926501 ## Run 19 stress 0.1646947 ## ... Procrustes: rmse 0.0005612703 max resid 0.001447219 ## ... Similar to previous best ## Run 20 stress 0.2449735 ## *** Solution reached NMDS <- metaMDS(Bray_dist, previous = MDS.all) ## Starting from 2-dimensional conGguration ## Run 0 stress 0.1646927 ## Run 1 stress 0.1646947 ## ... Procrustes: rmse 0.0006178828 max resid 0.001603417 ## ... Similar to previous best ## Run 2 stress 0.2070391 ## Run 3 stress 0.1704111 ## Run 4 stress 0.1731632 ## Run 5 stress 0.2103811 ## Run 6 stress 0.2174316 ## Run 7 stress 0.1646924 ## ... New best solution ## ... Procrustes: rmse 0.0002154949 max resid 0.0004250305 ## ... Similar to previous best ## Run 8 stress 0.1646947 ## ... Procrustes: rmse 0.0008319433 max resid 0.002014447 ## ... Similar to previous best ## Run 9 stress 0.1733012 ## Run 10 stress 0.1646916 ## ... New best solution ## ... Procrustes: rmse 0.001454724 max resid 0.003308742 ## ... Similar to previous best ## Run 11 stress 0.2092388 ## Run 12 stress 0.2020796 ## Run 13 stress 0.1646914 ## ... New best solution ## ... Procrustes: rmse 0.0001901231 max resid 0.0004187128 ## ... Similar to previous best ## Run 14 stress 0.1644499 ## ... New best solution ## ... Procrustes: rmse 0.01945659 max resid 0.05372317 ## Run 15 stress 0.1644498 ## ... New best solution ## ... Procrustes: rmse 7.634867e-05 max resid 0.000185818 ## ... Similar to previous best ## Run 16 stress 0.1646914 ## ... Procrustes: rmse 0.01942393 max resid 0.05359769 ## Run 17 stress 0.1704151 ## Run 18 stress 0.2132026 ## Run 19 stress 0.2204661 ## Run 20 stress 0.2195082 ## *** Solution reached 123

MDS1 = NMDS$points[,1] MDS2 = NMDS$points[,2] NMDSplot <- mutate(clean_meta_Traits_Env, MDS1 = MDS1, MDS2 = MDS2, MDSNames = names(MDS1)) NMDSplot$MDSNames == NMDSplot$Sample ## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [15] TRUE ggplot(NMDSplot, aes(x=MDS1, y=MDS2, col=FEVdensity)) + geom_point(size=3) + stat_ellipse() + theme_bw() + labs(title = "NMDS Plot") + geom_text(data=NMDSplot,aes(x=MDS1,y=MDS2,label= paste(NMDSplot$Species, NMDSplot$Island)),size=3,vjust=2,hjust=.20)

#Redundancy analysis calculations and plots: ###################### ##Full community RDA## ###################### clean_meta_Traits_Env[c(1,2,3,6,7,14,23:60)]-> variables continuous_variables <- select(variables, 3,5,6:30, 32:37,39:41,43,44) 124 non_leaf_variables <- select(variables, 1:6,38:44) row.names(clean_otus) == variables$Sample ## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [15] TRUE otus0 <- rda(clean_otus ~ 1, data = continuous_variables) otus1 <- rda(clean_otus~ ., data=continuous_variables) ordistep(otus0, scope = formula(otus1)) ## ## Start: clean_otus ~ 1 ## ## Df AIC F Pr(>F) ## + ElevFactor 3 301.38 1.5578 0.030 * ## + Isolation_Site 10 291.11 2.4750 0.040 * ## + CN 1 300.77 1.7708 0.060 . ## + TotalReads 1 300.83 1.7151 0.080 . ## + X_13C 1 301.06 1.4954 0.080 . ## + NP 1 300.87 1.6798 0.085 . ## + Adaxialporelength 1 301.17 1.3841 0.105 ## + Elevation 1 300.81 1.7336 0.130 ## + sqrtReads 1 301.16 1.3938 0.130 ## + temp 1 300.77 1.7774 0.145 ## + Parea 1 301.14 1.4138 0.165 ## + Abaxialstomataldensity 1 301.10 1.4506 0.175 ## + LeafLW 1 301.31 1.2553 0.255 ## + Minorveindensity 1 301.31 1.2522 0.260 ## + AbaxialSPI 1 301.43 1.1422 0.300 ## + Pmass 1 301.47 1.1056 0.360 ## + Succulence 1 301.35 1.2181 0.390 ## + FEVdensity 1 301.49 1.0831 0.405 ## + Leafwidth 1 301.54 1.0339 0.415 ## + Height 1 301.50 1.0741 0.440 ## + Sclereidwidth 1 301.50 1.0691 0.445 ## + Totalveindensity 1 301.52 1.0566 0.445 ## + LMA 1 301.52 1.0526 0.450 ## + Branchdiameters 1 301.68 0.9085 0.485 ## + SclereidLW 1 301.59 0.9931 0.495 ## + Petiolediameter 1 301.64 0.9400 0.505 ## + Nmass 1 301.62 0.9629 0.525 ## + ShanDiversity 1 301.74 0.8528 0.560 ## + AdaxialSPI 1 301.68 0.9076 0.565 ## + Sclereiddensity 1 301.62 0.9609 0.585 ## + Adaxialstomataldensity 1 301.66 0.9234 0.585 ## + Narea 1 301.73 0.8616 0.620 ## + LeaTength 1 301.74 0.8530 0.645 ## + Leafteeth 1 301.97 0.6361 0.815 ## + Leafarea 1 301.98 0.6317 0.815 ## + Abaxialporelength 1 301.93 0.6733 0.875 ## + rain 1 302.15 0.4765 0.920 ## + Sclereidlength 1 301.99 0.6204 0.940 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Step: clean_otus ~ ElevFactor ## ## Df AIC F Pr(>F) ## - ElevFactor 3 300.69 1.5578 0.035 * 125

## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Df AIC F Pr(>F) ## + Isolation_Site 7 291.11 2.3111 0.020 * ## + TotalReads 1 301.19 1.5735 0.165 ## + sqrtReads 1 301.57 1.2832 0.285 ## + Elevation 1 301.79 1.1204 0.410 ## + LeafLW 1 301.93 1.0148 0.465 ## + ShanDiversity 1 301.98 0.9798 0.470 ## + Abaxialporelength 1 302.07 0.9143 0.490 ## + Sclereiddensity 1 302.21 0.8097 0.590 ## + Sclereidwidth 1 302.28 0.7595 0.630 ## + temp 1 302.35 0.7076 0.645 ## + Minorveindensity 1 302.36 0.7015 0.670 ## + rain 1 302.38 0.6873 0.715 ## + Adaxialporelength 1 302.34 0.7184 0.720 ## + Succulence 1 302.38 0.6862 0.720 ## + SclereidLW 1 302.36 0.7047 0.725 ## + X_13C 1 302.42 0.6582 0.730 ## + LeaTength 1 302.45 0.6368 0.740 ## + FEVdensity 1 302.34 0.7205 0.770 ## + CN 1 302.45 0.6367 0.770 ## + Totalveindensity 1 302.48 0.6176 0.770 ## + AbaxialSPI 1 302.57 0.5567 0.770 ## + Leafwidth 1 302.50 0.6047 0.775 ## + Narea 1 302.51 0.5961 0.775 ## + AdaxialSPI 1 302.43 0.6515 0.780 ## + Adaxialstomataldensity 1 302.44 0.6472 0.780 ## + Branchdiameters 1 302.52 0.5883 0.790 ## + LMA 1 302.55 0.5683 0.800 ## + Pmass 1 302.61 0.5295 0.800 ## + Nmass 1 302.50 0.6052 0.810 ## + Leafarea 1 302.61 0.5249 0.820 ## + NP 1 302.63 0.5156 0.825 ## + Abaxialstomataldensity 1 302.52 0.5916 0.830 ## + Parea 1 302.70 0.4663 0.845 ## + Petiolediameter 1 302.56 0.5644 0.850 ## + Sclereidlength 1 302.74 0.4364 0.890 ## + Leafteeth 1 302.65 0.4988 0.895 ## + Height 1 302.91 0.3200 0.960 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Step: clean_otus ~ ElevFactor + Isolation_Site ## ## Df AIC F Pr(>F) ## - Isolation_Site 7 301.38 2.3111 0.085 . ## - ElevFactor 0 291.11 -Inf ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Df AIC F Pr(>F) ## + TotalReads 1 285.28 2.0546 0.180 ## + sqrtReads 1 285.41 2.0097 0.190 ## + ShanDiversity 1 287.07 1.4867 0.220 ## + Branchdiameters 1 286.77 1.5776 0.260 ## + Abaxialporelength 1 286.72 1.5914 0.265 ## + Nmass 1 286.77 1.5776 0.280 ## + CN 1 286.85 1.5520 0.305 ## + Adaxialstomataldensity 1 286.92 1.5308 0.305 126

## + Leafwidth 1 287.78 1.2783 0.310 ## + FEVdensity 1 287.78 1.2783 0.310 ## + Petiolediameter 1 287.47 1.3673 0.320 ## + Narea 1 287.47 1.3673 0.325 ## + AdaxialSPI 1 287.02 1.4994 0.340 ## + Minorveindensity 1 287.47 1.3673 0.360 ## + LMA 1 287.47 1.3673 0.375 ## + Succulence 1 287.78 1.2783 0.400 ## + X_13C 1 287.47 1.3673 0.410 ## + SclereidLW 1 287.47 1.3673 0.420 ## + Leafteeth 1 289.06 0.9287 0.500 ## + Leafarea 1 289.06 0.9287 0.525 ## + Adaxialporelength 1 289.06 0.9276 0.530 ## + Totalveindensity 1 289.06 0.9287 0.550 ## + Height 1 289.06 0.9276 0.550 ## + Pmass 1 289.06 0.9276 0.555 ## + Abaxialstomataldensity 1 289.06 0.9287 0.590 ## + LeaTength 1 290.33 0.6088 0.640 ## + LeafLW 1 290.33 0.6088 0.670 ## + temp 1 291.56 0.3248 0.700 ## + Sclereidlength 1 291.17 0.4133 0.755 ## + NP 1 292.04 0.2216 0.755 ## + AbaxialSPI 1 291.17 0.4133 0.760 ## + Parea 1 292.42 0.1399 0.770 ## + rain 1 291.61 0.3155 0.795 ## + Sclereiddensity 1 292.62 0.0984 0.795 ## + Sclereidwidth 1 292.42 0.1399 0.810 ## + Elevation 1 291.98 0.2329 0.830 ## Call: rda(formula = clean_otus ~ ElevFactor + Isolation_Site, data ## = continuous_variables) ## ## Inertia Proportion Rank ## Total 4.764e+08 1.000e+00 ## Constrained 4.101e+08 8.609e-01 10 ## Unconstrained 6.628e+07 1.391e-01 4 ## Inertia is variance ## Some constraints were aliased because they were collinear (redundant) ## ## Eigenvalues for constrained axes: ## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 RDA7 ## 144281813 114350177 45676639 33316295 25332594 19755697 14436883 ## RDA8 RDA9 RDA10 ## 9075306 3680233 185466 ## ## Eigenvalues for unconstrained axes: ## PC1 PC2 PC3 PC4 ## 37469443 20981954 7421009 405118 fullrda<-rda(clean_otus ~ Lineage+temp,data=variables) fullrda ## Call: rda(formula = clean_otus ~ Lineage + temp, data = variables) ## ## Inertia Proportion Rank ## Total 4.764e+08 1.000e+00 ## Constrained 1.271e+08 2.668e-01 3 ## Unconstrained 3.493e+08 7.332e-01 11 ## Inertia is variance ## ## Eigenvalues for constrained axes: 127

## RDA1 RDA2 RDA3 ## 74165317 41776873 11157097 ## ## Eigenvalues for unconstrained axes: ## PC1 PC2 PC3 PC4 PC5 PC6 PC7 ## 136608294 69549530 42479790 40263721 26121966 14545599 12295525 ## PC8 PC9 PC10 PC11 ## 4360311 2479283 425476 139842 plot(fullrda,display = c("wa","cn"), scaling = 1)

########################################### ####RareGed community Bray Distances RDA## ########################################### ###without leaf trait data bray0 <- capscale(Bray_dist~ 1, data = non_leaf_variables ) bray1 <- capscale(Bray_dist ~ ., data=non_leaf_variables ) ordistep(bray0, scope = formula(bray1)) ## ## Start: Bray_dist ~ 1 ## ## Df AIC F Pr(>F) ## + Elevation 1 30.01 1.2211 0.015 * ## + TempFactor 8 31.43 1.1480 0.015 * 128

## + temp 1 29.99 1.2387 0.020 * ## + ElevFactor 3 31.29 1.1392 0.025 * ## + Lineage 2 30.63 1.1961 0.035 * ## + Species 6 32.39 1.0905 0.095 . ## + ShanDiversity 1 30.12 1.1133 0.100 . ## + Isolation_Site 10 30.28 1.0261 0.245 ## + rain 1 30.19 1.0448 0.260 ## + TotalReads 1 30.18 1.0563 0.270 ## + sqrtReads 1 30.19 1.0513 0.335 ## + Island 2 31.23 0.9122 0.885 ## + Sample 14 -1028.63 0.0000 1.000 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Step: Bray_dist ~ Elevation ## ## Df AIC F Pr(>F) ## - Elevation 1 29.353 1.2211 0.03 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Df AIC F Pr(>F) ## + Lineage 2 30.93 1.2523 0.020 * ## + TempFactor 8 30.78 1.1000 0.095 . ## + Species 6 32.11 1.0904 0.125 ## + ShanDiversity 1 30.63 1.1547 0.130 ## + ElevFactor 3 31.93 1.0420 0.280 ## + Isolation_Site 10 28.10 0.9919 0.455 ## + TotalReads 1 30.77 1.0284 0.455 ## + rain 1 30.78 1.0242 0.460 ## + sqrtReads 1 30.77 1.0266 0.465 ## + Island 2 31.64 0.9416 0.745 ## + temp 1 31.00 0.8297 0.920 ## + Sample 13 -1021.49 0.0000 1.000 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Step: Bray_dist ~ Elevation + Lineage ## ## Df AIC F Pr(>F) ## - Lineage 2 30.006 1.2523 0.005 ** ## - Elevation 1 30.626 1.3177 0.005 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Df AIC F Pr(>F) ## + ShanDiversity 1 31.34 1.1157 0.310 ## + rain 1 31.37 1.0974 0.390 ## + Species 4 32.11 1.0076 0.420 ## + TempFactor 6 30.78 1.0401 0.425 ## + Isolation_Site 9 23.29 1.0056 0.475 ## + sqrtReads 1 31.51 0.9921 0.500 ## + ElevFactor 2 31.91 1.0017 0.510 ## + Island 2 31.89 1.0109 0.515 ## + TotalReads 1 31.47 1.0184 0.535 ## + temp 1 31.62 0.9124 0.735 ## + Sample 11 -1000.71 0.0000 1.000 ## Call: capscale(formula = Bray_dist ~ Elevation + Lineage, data = ## non_leaf_variables) ## 129

## Inertia Proportion Rank ## Total 6.6360 1.0000 ## Constrained 1.6949 0.2554 3 ## Unconstrained 4.9411 0.7446 11 ## Inertia is squared Bray distance ## ## Eigenvalues for constrained axes: ## CAP1 CAP2 CAP3 ## 0.5946 0.5658 0.5345 ## ## Eigenvalues for unconstrained axes: ## MDS1 MDS2 MDS3 MDS4 MDS5 MDS6 MDS7 MDS8 MDS9 MDS10 ## 0.5946 0.5615 0.5303 0.5201 0.4946 0.4604 0.4483 0.4011 0.3341 0.3187 ## MDS11 ## 0.2774 #chosen formula brayrda<-capscale(formula = Bray_dist ~ temp + Lineage, data = non_leaf_variables) anova(brayrda) ## Permutation test for capscale under reduced model ## Permutation: free ## Number of permutations: 999 ## ## Model: capscale(formula = Bray_dist ~ temp + Lineage, data = non_leaf_variables) ## Df SumOfSqs F Pr(>F) ## Model 3 1.710 1.2729 0.002 ** ## Residual 11 4.926 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 goodness(brayrda, summarize = TRUE, display="sites") ## K.0101.P4 BI.245 K.0009 K.0115 K.0048 K.0006 ## 0.13311142 0.24342030 0.26483191 0.07898768 0.33010399 0.12736497 ## K.0078 K.0055 BI.190 K.0036 EKH28 K.0007 ## 0.51996345 0.54673771 0.06432239 0.55244266 0.12461252 0.16761594 ## BI.092 SP10 BI.226 ## 0.52015427 0.03625287 0.15056920 colors.vec <- rainbow(8) #plot plot(brayrda, display = c("wa","cn"), scaling = 1, cex=3, type = "points") points(brayrda, display = "sites", cex=2, pch = 21, scaling = 1, col = colors.vec[clean_meta_Traits_Env$Species], bg = colors.vec[clean_meta_Traits_Env$Species]) ordipointlabel(brayrda, display=c("wa","cn"), scaling = 1, add = T, cex = c(1,1)) #To color labels separately: #ordipointlabel(brayrda, display=c("wa"), scaling = 1, add = T, cex = c(1,1)) #ordipointlabel(brayrda, display=c("cn"), scaling = 1, add = T, cex = c(1,1)) ordihull(brayrda, groups = clean_meta_Traits_Env$Lineage, scaling=1, col = "red") legend("topright", legend = levels(clean_meta_Traits_Env$Species), bty = "n", col = colors.vec, pch = 21, pt.bg = colors.vec) 130

#Redundancy analysis including leaf trait data (McKown et al., 2016): ###with leaf trat data leafbray0 <- capscale(Bray_dist~ 1, data = variables ) leafbray1 <- capscale(Bray_dist ~ ., data= variables ) ordistep(leafbray0, scope = formula(leafbray1)) ## ## Start: Bray_dist ~ 1 ## ## Df AIC F Pr(>F) ## + FEVdensity 1 29.98 1.2502 0.015 * ## + Totalveindensity 1 29.98 1.2414 0.015 * ## + Minorveindensity 1 30.02 1.2105 0.015 * ## + TempFactor 8 31.43 1.1480 0.015 * ## + Succulence 1 30.03 1.2021 0.020 * ## + Elevation 1 30.01 1.2211 0.025 * ## + Leafwidth 1 30.05 1.1814 0.025 * ## + temp 1 29.99 1.2387 0.035 * ## + Parea 1 30.03 1.1997 0.035 * ## + ElevFactor 3 31.29 1.1392 0.035 * ## + Lineage 2 30.63 1.1961 0.040 * ## + LeafLW 1 30.05 1.1786 0.045 * ## + AdaxialSPI 1 30.06 1.1683 0.055 . ## + Species 6 32.39 1.0905 0.055 . ## + Adaxialstomataldensity 1 30.05 1.1766 0.065 . ## + Sclereidwidth 1 30.10 1.1340 0.070 . 131

## + CN 1 30.07 1.1560 0.075 . ## + Leafteeth 1 30.10 1.1336 0.085 . ## + X_13C 1 30.06 1.1725 0.090 . ## + Pmass 1 30.10 1.1350 0.090 . ## + LMA 1 30.11 1.1240 0.110 ## + Narea 1 30.10 1.1294 0.120 ## + Epidermaltrichomes 1 30.10 1.1284 0.120 ## + Nmass 1 30.11 1.1270 0.125 ## + Branchdiameters 1 30.11 1.1275 0.130 ## + Abaxialstomataldensity 1 30.12 1.1153 0.135 ## + ShanDiversity 1 30.12 1.1133 0.145 ## + Leafarea 1 30.15 1.0897 0.145 ## + Adaxialporelength 1 30.11 1.1194 0.175 ## + Height 1 30.14 1.0957 0.175 ## + Sclereiddensity 1 30.16 1.0787 0.175 ## + AbaxialSPI 1 30.18 1.0612 0.250 ## + Isolation_Site 10 30.28 1.0261 0.260 ## + NP 1 30.17 1.0659 0.270 ## + TotalReads 1 30.18 1.0563 0.310 ## + rain 1 30.19 1.0448 0.320 ## + sqrtReads 1 30.19 1.0513 0.360 ## + Abaxialporelength 1 30.21 1.0296 0.365 ## + SclereidLW 1 30.22 1.0203 0.395 ## + Petiolediameter 1 30.23 1.0128 0.425 ## + Sclereidlength 1 30.24 0.9990 0.480 ## + LeaTength 1 30.28 0.9639 0.635 ## + Island 2 31.23 0.9122 0.945 ## + Sample 14 -1028.63 0.0000 1.000 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Step: Bray_dist ~ FEVdensity ## ## Df AIC F Pr(>F) ## - FEVdensity 1 29.353 1.2502 0.02 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Df AIC F Pr(>F) ## + temp 1 30.46 1.2751 0.020 * ## + Elevation 1 30.49 1.2518 0.045 * ## + ElevFactor 3 31.54 1.1476 0.060 . ## + X_13C 1 30.56 1.1886 0.065 . ## + CN 1 30.57 1.1805 0.065 . ## + Succulence 1 30.54 1.2056 0.080 . ## + LeafLW 1 30.54 1.2068 0.085 . ## + TempFactor 8 30.60 1.1168 0.095 . ## + Parea 1 30.58 1.1670 0.135 ## + Epidermaltrichomes 1 30.60 1.1490 0.160 ## + Sclereidwidth 1 30.60 1.1500 0.185 ## + LMA 1 30.62 1.1344 0.185 ## + ShanDiversity 1 30.64 1.1205 0.190 ## + Lineage 2 31.25 1.0970 0.195 ## + Pmass 1 30.63 1.1258 0.200 ## + Sclereiddensity 1 30.66 1.1010 0.200 ## + Abaxialstomataldensity 1 30.64 1.1152 0.215 ## + Isolation_Site 10 27.22 1.0680 0.220 ## + Leafarea 1 30.66 1.1019 0.230 ## + Nmass 1 30.65 1.1095 0.240 ## + Species 5 32.39 1.0535 0.250 ## + NP 1 30.67 1.0872 0.255 132

## + Adaxialstomataldensity 1 30.71 1.0545 0.325 ## + TotalReads 1 30.69 1.0734 0.330 ## + AbaxialSPI 1 30.69 1.0778 0.340 ## + rain 1 30.71 1.0559 0.350 ## + Abaxialporelength 1 30.72 1.0487 0.355 ## + sqrtReads 1 30.71 1.0557 0.400 ## + Totalveindensity 1 30.77 1.0038 0.490 ## + Adaxialporelength 1 30.76 1.0170 0.530 ## + Height 1 30.76 1.0135 0.535 ## + Leafteeth 1 30.77 1.0010 0.540 ## + Sclereidlength 1 30.79 0.9900 0.580 ## + Leafwidth 1 30.84 0.9474 0.660 ## + Narea 1 30.85 0.9392 0.690 ## + Minorveindensity 1 30.85 0.9366 0.715 ## + Branchdiameters 1 30.84 0.9432 0.730 ## + AdaxialSPI 1 30.87 0.9180 0.730 ## + Petiolediameter 1 30.88 0.9128 0.760 ## + Island 2 31.63 0.9299 0.770 ## + SclereidLW 1 30.86 0.9235 0.800 ## + LeaTength 1 30.92 0.8766 0.800 ## + Sample 13 -1042.08 0.0000 1.000 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Step: Bray_dist ~ FEVdensity + temp ## ## Df AIC F Pr(>F) ## - temp 1 29.975 1.2751 0.01 ** ## - FEVdensity 1 29.988 1.2859 0.01 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Df AIC F Pr(>F) ## + Lineage 2 31.35 1.1534 0.135 ## + X_13C 1 30.88 1.2189 0.150 ## + CN 1 30.92 1.1936 0.150 ## + ShanDiversity 1 30.95 1.1616 0.175 ## + Abaxialstomataldensity 1 30.96 1.1559 0.220 ## + Nmass 1 31.01 1.1175 0.230 ## + LMA 1 30.96 1.1578 0.245 ## + Pmass 1 30.97 1.1503 0.245 ## + TempFactor 8 29.27 1.0725 0.250 ## + Parea 1 30.98 1.1401 0.270 ## + Isolation_Site 10 23.35 1.0185 0.365 ## + Species 5 32.10 1.0440 0.370 ## + Sclereiddensity 1 31.08 1.0634 0.385 ## + Abaxialporelength 1 31.06 1.0736 0.410 ## + Adaxialporelength 1 31.10 1.0481 0.415 ## + NP 1 31.10 1.0451 0.425 ## + Succulence 1 31.11 1.0329 0.440 ## + Epidermaltrichomes 1 31.08 1.0636 0.445 ## + Leafteeth 1 31.15 1.0048 0.450 ## + sqrtReads 1 31.12 1.0247 0.455 ## + rain 1 31.14 1.0149 0.455 ## + ElevFactor 3 32.03 1.0304 0.465 ## + Sclereidwidth 1 31.11 1.0355 0.470 ## + Leafarea 1 31.10 1.0420 0.475 ## + AbaxialSPI 1 31.16 0.9947 0.485 ## + Height 1 31.12 1.0282 0.490 ## + Sclereidlength 1 31.14 1.0142 0.490 ## + TotalReads 1 31.10 1.0414 0.500 133

## + LeafLW 1 31.12 1.0244 0.505 ## + Totalveindensity 1 31.13 1.0213 0.515 ## + Adaxialstomataldensity 1 31.16 0.9985 0.525 ## + Island 2 31.83 0.9586 0.540 ## + Minorveindensity 1 31.21 0.9571 0.595 ## + AdaxialSPI 1 31.23 0.9415 0.625 ## + SclereidLW 1 31.23 0.9388 0.680 ## + Leafwidth 1 31.25 0.9238 0.680 ## + Narea 1 31.30 0.8863 0.690 ## + Branchdiameters 1 31.23 0.9378 0.715 ## + LeaTength 1 31.28 0.9041 0.730 ## + Petiolediameter 1 31.28 0.9003 0.730 ## + Elevation 1 31.35 0.8415 0.795 ## + Sample 12 -991.78 0.0000 1.000 ## Call: capscale(formula = Bray_dist ~ FEVdensity + temp, data = ## variables) ## ## Inertia Proportion Rank ## Total 6.6360 1.0000 ## Constrained 1.1637 0.1754 2 ## Unconstrained 5.4723 0.8246 12 ## Inertia is squared Bray distance ## ## Eigenvalues for constrained axes: ## CAP1 CAP2 ## 0.5965 0.5672 ## ## Eigenvalues for unconstrained axes: ## MDS1 MDS2 MDS3 MDS4 MDS5 MDS6 MDS7 MDS8 MDS9 MDS10 ## 0.6849 0.5735 0.5510 0.5050 0.4902 0.4831 0.4531 0.4493 0.4010 0.3325 ## MDS11 MDS12 ## 0.3124 0.2362 #chosen formula leafbrayrda<-capscale(formula = Bray_dist ~ FEVdensity + Pmass + temp, data = continuous_variables) anova(leafbrayrda) ## Permutation test for capscale under reduced model ## Permutation: free ## Number of permutations: 999 ## ## Model: capscale(formula = Bray_dist ~ FEVdensity + Pmass + temp, data = continuous_variables) ## Df SumOfSqs F Pr(>F) ## Model 3 1.6817 1.2447 0.003 ** ## Residual 11 4.9542 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 goodness(leafbrayrda, summarize = TRUE, display="sites") ## K.0101.P4 BI.245 K.0009 K.0115 K.0048 K.0006 ## 0.20057177 0.08119963 0.33440408 0.02734698 0.38453335 0.32548319 ## K.0078 K.0055 BI.190 K.0036 EKH28 K.0007 ## 0.44893761 0.50291137 0.02953361 0.49499781 0.02854446 0.39538872 ## BI.092 SP10 BI.226 ## 0.43984029 0.05097916 0.02566620 134

#plot plot(leafbrayrda, display = c("wa","cn"), scaling = 1, cex =3, type = "points") colors.vec <- rainbow(8) points(leafbrayrda, display = "sites", cex=2, pch = 21, scaling = 1, col = colors.vec[clean_meta_Traits_Env$Species], bg = colors.vec[clean_meta_Traits_Env$Species]) ordipointlabel(leafbrayrda, display=c("wa"), scaling = 1, add = T) ordihull(leafbrayrda, groups = clean_meta_Traits_Env$Lineage, scaling=1, label = T, col="red") legend("topright", legend = levels(clean_meta_Traits_Env$Species), bty = "n", col = colors.vec, pch = 21, pt.bg = colors.vec)

#Plotting maximum likelihood trees generated from cultured isolate LSU data: ################################### ##Plotting Cultured Isolate Trees## ################################### library(ggtree) ## ## Attaching package: 'ggtree' ## The following object is masked from 'package:nlme': ## ## collapse 135

## The following object is masked from 'package:tidyr': ## ## expand ## The following object is masked from 'package:dplyr': ## ## collapse ## The following objects are masked from 'package:raster': ## ## Tip, mask, rotate library(ggplot2) library(dplyr) library(ape) ## ## Attaching package: 'ape' ## The following object is masked from 'package:ggtree': ## ## rotate ## The following objects are masked from 'package:raster': ## ## rotate, zoom library(phylobase) ## ## Attaching package: 'phylobase' ## The following object is masked from 'package:ape': ## ## edges ## The following object is masked from 'package:ggtree': ## ## MRCA #Read tree read.tree("/Users/seanswift/Dropbox/Scaevola_FEF_Phylogenetics/Final LSU Trees/All Isolates/RAxML_Tree_1/RAxML_bipartitions.Gnal2")-> Tree ladderize(Tree)-> Tree root(Tree, "Agaricostilbum_hyphaenes")->Tree Tree$tip.label -> tips gsub("__reversed_", "", tips)-> tips as.data.frame(tips)-> tips tips %>% rename(IsolateID=tips)-> tips #Add bootstrap values apeBoot(Tree, Tree$node.label) -> Tree2 #Add metadata read.csv("OTU_meta.csv")-> metadata tips %>% left_join(metadata) -> tip_table ## Joining, by = "IsolateID" ## Warning in left_join_impl(x, y, by$x, by$y, sufGx$x, sufGx$y): joining ## factors with different levels, coercing to character vector 136

#Setting up Tree plot p <- ggtree(Tree2)+theme_tree2() p <- p + geom_text(aes(label=bootstrap), size=2, nudge_x = -.008, nudge_y = 0.4) p<- p %<+% tip_table+geom_tiplab(align = F, cex=3, aes(color = taxonomy6))+ scale_x_continuous(expand = c(0.4, 0)) p<-p+theme(legend.position="left") p ## Warning: Removed 383 rows containing missing values (geom_text).

ggtree(Tree2) + geom_text2(aes(subset=!isTip, label=node), hjust=-.3) 137

#Sardariomycetes #highlight big tree for appropriate clade p1<- ggtree(Tree2) p1%<+% tip_table 138

p1<-p1+geom_hilight(node=659, Gll="steelblue", alpha=.6) #plot sardariomycetes next to it p2<-viewClade(p, node = 659)+ scale_x_continuous(expand = c(0.2, 0)) +labs(title="Sardariomycetes") ## Scale for 'x' is already present. Adding another scale for 'x', which ## will replace the existing scale. ## Scale for 'x' is already present. Adding another scale for 'x', which ## will replace the existing scale. multiplot(p1,p2,ncol=2, widths = c(0.25,2)) ## Warning: Removed 660 rows containing missing values (geom_text). 139

#Dothideomycetes #highlight big tree for appropriate clade p4<- ggtree(Tree2) p4%<+% tip_table 140

p4<-p4+geom_hilight(node=471, Gll="steelblue", alpha=.6) #plot sardariomycetes next to it p5<-viewClade(p, node = 471)+ scale_x_continuous(expand = c(0.2, 0)) +labs(title="Dothideomycetes") ## Scale for 'x' is already present. Adding another scale for 'x', which ## will replace the existing scale. ## Scale for 'x' is already present. Adding another scale for 'x', which ## will replace the existing scale. #p<-p+theme(legend.position="right") #p <- p + geom_text(aes(label=bootstrap, hjust=0, nudge_x = 5, show.legend = F, alpha = 0.8, size = 0.5)) multiplot(p4,p5,ncol=2, widths = c(0.25,2)) ## Warning: Removed 667 rows containing missing values (geom_text). 141

##GLomerella Tree #Read tree read.tree("/Users/seanswift/Dropbox/Scaevola_FEF_Phylogenetics/Final LSU Trees/Glomerellales/Bigger_Glomerellales/RAxMLTree2/RAxML_bipartitions.Gnal2")-> GlomTree ladderize(GlomTree)-> GlomTree root(GlomTree, "DQ522856_Lulworthia_grandispora")->GlomTree GlomTree$tip.label -> tips as.data.frame(tips)-> tips tips %>% separate(tips, c("IsolateID","Junk"), sep="_", remove=F)-> glomtips ## Warning: Too many values at 36 locations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ## 11, 12, 13, 14, 15, 16, 17, 18, 21, 22, ... #Add bootstrap values apeBoot(GlomTree, GlomTree$node.label) -> GlomTree2 #Add metadata glomtips %>% left_join(metadata) -> glomtip_table ## Joining, by = "IsolateID" ## Warning in left_join_impl(x, y, by$x, by$y, sufGx$x, sufGx$y): joining ## factor and character vector, coercing into character vector 142

#Setting up Tree plot p <- ggtree(GlomTree2)+theme_tree2() p <- p + geom_text(aes(label=bootstrap), size=2, nudge_x = -.004, nudge_y = 0.4) p<- p %<+% glomtip_table+geom_tiplab(align = F, cex=3, aes(color = taxonomy7))+ scale_x_continuous(expand = c(0.2, 0)) p<-p+theme(legend.position=c(0.1,0.3))+labs(title="Glomerellales") p ## Warning: Removed 45 rows containing missing values (geom_text).

#Annotating maximum likelihood trees with habitat abundance data: ################# ##Heatmap Trees## ################# #Read SmallTree read.tree("/Users/seanswift/Dropbox/Scaevola_FEF_Phylogenetics/Final LSU Trees/All Isolates/JustIsolates/RAxML_bipartitions.Gnal2")-> SmallTree ladderize(SmallTree)-> SmallTree SmallTree$tip.label -> SmallTips gsub("__reversed_", "", SmallTips)-> SmallTips as.data.frame(SmallTips)-> SmallTips SmallTips %>% rename(IsolateID=SmallTips)-> SmallTips read.csv("OTU_meta.csv")-> metadata 143

#Add metadata SmallTips %>% left_join(metadata) -> Small_tip_table ## Joining, by = "IsolateID" ## Warning in left_join_impl(x, y, by$x, by$y, sufGx$x, sufGx$y): joining ## factors with different levels, coercing to character vector #Add bootstrap values apeBoot(SmallTree, SmallTree$node.label) -> SmallTree2 #Add Ref OTU Info SmallTips %>% left_join(metadata[c(2,16)]) -> SmallTips ## Joining, by = "IsolateID" ## Warning in left_join_impl(x, y, by$x, by$y, sufGx$x, sufGx$y): joining ## factors with different levels, coercing to character vector SmallTips %>% rename(Ref=OTU)-> SmallTips #Add heatmap data and join to tips read.csv("/Users/seanswift/Dropbox/Illumina_Culture_OTUs/Gnal_heatmap.csv", header=T, row.names = 1)->Gnal_heatmap SmallTips %>% left_join(Gnal_heatmap) -> Small_heatmap_num ## Joining, by = "Ref" ## Warning in left_join_impl(x, y, by$x, by$y, sufGx$x, sufGx$y): joining ## factors with different levels, coercing to character vector Small_heatmap_num[is.na(Small_heatmap_num)]<- 0 row.names(Small_heatmap_num) <- Small_heatmap_num$IsolateID Small_heatmap_num %>% select(-IsolateID) -> Small_heatmap_num Small_heatmap_num -> Small_heatmap #Count host species Small_heatmap_num[32:38] %>% apply(1,sum)-> HostCount HostCount %>% sort()%>% as.data.frame -> HostCount HostCount %>% mutate(IsolateID = row.names(HostCount))-> HostCount colnames(HostCount)<- c("HostSpecies","IsolateID") p<-ggplot(data = HostCount, aes(x=reorder(IsolateID, HostSpecies), y = HostSpecies)) +scale_y_continuous(breaks = 1:7) + geom_bar(stat="identity") + coord_Tip() +labs(title = "Cultured Isolate Host SpeciGcity Based on Illumina Data") p 144

#Convert data to Presence/Absence factors Small_tip_table %>% mutate(IlluminaData = ifelse(rowSums(Small_heatmap_num[2:38]) >1, "Present", "Absent")) -> Small_tip_table Small_heatmap[Small_heatmap == 0] <- "Absent" Small_heatmap[Small_heatmap == 1] <- "Present" Small_heatmap %>% select(-MaxElev, -MinElev, -Ref)-> Small_heatmap #rearrange so that elevation is in correct order Small_heatmap %>% select(1,4:7,2,3,8:35)-> Small_heatmap #convert to factor Small_heatmap %>% apply(.,2,as.factor) %>% as.data.frame() -> Small_heatmap_factor

#Setting up SmallTree plot p <- ggtree(SmallTree2) p <- p + geom_text(aes(label=bootstrap), size=2, nudge_x = -.008, nudge_y = 0.4) p <- p %<+% Small_tip_table+geom_tiplab(align = T, cex=3, aes(color = IlluminaData)) ## Warning: The plyr::rename operation has created duplicates for the ## following name(s): (`size`) p <- p+theme(legend.position="right") p <- gheatmap(p, Small_heatmap_factor, high = "blue4", low = "gray95", width=10, color = "gray", colnames_position = 'top', offset = 0.06) #plot to pdf p 145

## Warning: Removed 46 rows containing missing values (geom_text).

#Just Temperature p2 <- ggtree(SmallTree2) p2 <- p2 + geom_text(aes(label=bootstrap), size=2, nudge_x = -.008, nudge_y = 0.4) p2 <- p2 %<+% Small_tip_table+geom_tiplab(align = T, cex=3, aes(color = IlluminaData)) ## Warning: The plyr::rename operation has created duplicates for the ## following name(s): (`size`) p2 <- p2+theme(legend.position="right") p2 <- gheatmap(p2, Small_heatmap_factor[8:16], high = "blue4", low = "gray95", width=10, color = "gray", colnames_position = 'top', offset = 0.06) #plot to pdf p2 ## Warning: Removed 46 rows containing missing values (geom_text). 146

#Just Island p3 <- ggtree(SmallTree2) p3 <- p3 + geom_text(aes(label=bootstrap), size=2, nudge_x = -.008, nudge_y = 0.4) p3 <- p3 %<+% Small_tip_table+geom_tiplab(align = T, cex=3, aes(color = IlluminaData)) ## Warning: The plyr::rename operation has created duplicates for the ## following name(s): (`size`) p3 <- p3+theme(legend.position="right") p3 <- gheatmap(p3, Small_heatmap_factor[26:28], high = "blue4", low = "gray95", width=10, color = "gray", colnames_position = 'top', offset = 0.06) #plot to pdf p3 ## Warning: Removed 46 rows containing missing values (geom_text). 147

#Just Species p4 <- ggtree(SmallTree2) p4 <- p4 + geom_text(aes(label=bootstrap), size=2, nudge_x = -.008, nudge_y = 0.4) p4 <- p4 %<+% Small_tip_table+geom_tiplab(align = T, cex=3, aes(color = IlluminaData)) ## Warning: The plyr::rename operation has created duplicates for the ## following name(s): (`size`) p4 <- p4+theme(legend.position="right") p4 <- gheatmap(p4, Small_heatmap_factor[29:35], high = "blue4", low = "gray95", width=4, color = "gray", colnames_position = 'top', offset = 0.1) #plot to pdf p4 ## Warning: Removed 46 rows containing missing values (geom_text). 148

#Just Elevation p5 <- ggtree(SmallTree2) p5 <- p5 + geom_text(aes(label=bootstrap), size=2, nudge_x = -.008, nudge_y = 0.4) p5 <- p5 %<+% Small_tip_table+geom_tiplab(align = T, cex=3, aes(color = IlluminaData)) ## Warning: The plyr::rename operation has created duplicates for the ## following name(s): (`size`) p5 <- p5+theme(legend.position="right") p5 <- gheatmap(p5, Small_heatmap_factor[1:7], high = "blue4", low = "gray95", width=4, color = "gray", colnames_position = 'top', offset = 0.1) #plot to pdf p5 ## Warning: Removed 46 rows containing missing values (geom_text). 149