The Pennsylvania State University The Graduate School

The Huck Institutes of the Life Sciences

GENOMICS OF GENOTYPE-BY-ENVIRONMENT INTERACTIONS IN SHRUB

WILLOW (SALIX SPP.): INSECT HERBIVORY AND SOIL MICROBIOMES

A Dissertation in Biology by Wanyan Wang

 2018 Wanyan Wang

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

May 2018

The dissertation of Wanyan Wang was reviewed and approved* by the following:

John E. Carlson Professor of Molecular Genetics Dissertation Advisor Chair of Committee

Yinong Yang Professor of Plant Pathology

Mary Ann Victoria Bruns Associate Professor of Soil Science and Microbial Ecology

Surinder Chopra Professor of Maize Genetics

Teh-hui Kao Distinguished Professor of Biochemistry and Molecular Biology Chair of the Plant Biology Graduate Program

*Signatures are on file in the Graduate School

ii

ABSTRACT

The growth of perennial shrub (Salix spp.), as a short rotation woody biomass crop, has superior properties for bioenergy production: short harvest cycle, high yield and adaptability to a wide range of site conditions, high net energy ratio, low demand for fertilizer and management and favorable environmental impact, like soil conservation and biodiversity. The aim of my research is to use advanced, genomics-based techniques to facilitate the breeding of new willow cultivars with improved and consistent yield across a wide variety of sites in the northeastern region, as well as resistance to pests and diseases. Understanding the interactions between environmental factors and shrub willow will be important for optimizing willow growth conditions and will also aid in developing improved cultivars that adapt better to particular environments.

There are two overall objectives for this thesis study:

1. Using RNA-Seq technique to capture the transcriptome dynamics of both resistant and susceptible willow species under insect herbivore - potato leafhopper infestation, and ultimately elucidate the defensive mechanism(s) and resistant genes/pathways of shrub willow against this pest.

2. Via comparison among the rhizosphere microbial communities originated from different geographic location and willow genotypes, to identify which factors shape the rhizosphere microbiome structure and how microbiome impact willow biomass yield.

iii

Table of Contents

LIST OF FIGURES ...... vi LIST OF TABLES ...... viii ACKNOWLEDGEMENTS ...... ix CHAPTER 1 Introduction to the Shrub Willow and Related Research ...... 1

1.1 INTRODUCTION TO ENERGY CROP ...... 2 1.1.1 Global energy crisis and bioenergy market ...... 2 1.1.2 Biomass as “carbon neutral” energy source ...... 2 1.1.3 Dedicated energy crops ...... 3 1.2 BIOLOGY OF THE SHRUB WILLOW ...... 4 1.2.1 Ecology, population and genetic Structure ...... 4 1.2.2 Commonly studied Salix species ...... 5 1.2.3 Genomics studies on ...... 5 1.3 USE OF THE SHRUB WILLOW IN ENERGY INDUSTRY ...... 7 1.4 USE OF SHRUB WILLOW IN OTHER ENVIRONMENTAL PROJECTS ...... 8 CHAPTER 2 Transcriptome Analysis of Contrasting Resistance to Herbivory by Empoasca fabae in Two Shrub Willow Species and Their Hybrid Progeny ...... 9

ABSTRACT ...... 10 ABBREVIATIONS ...... 10 INTRODUCTION ...... 11 RESULTS ...... 13 Phenotypic responses of shrub willow to PLH attack in greenhouse and field trials ...... 13 RNA Sequencing and quality assessment ...... 13 Differentiation of parent transcriptomes based on genotype and defense-response timing ...... 14 Identification of differentially expressed genes between parents from RNA-Seq data ...... 14 Weighted gene correlation network analysis (WGCNA) identified three clusters of genes associated with specific resistance mechanisms in the parents and hybrid progeny ...... 15 Functional annotation enrichment analysis of gene cluster I (darkgreen) ...... 16 Functional annotation enrichment analysis of gene cluster II (magenta)...... 17 Functional annotation enrichment analysis of gene cluster III (black) ...... 18 PLH-resistance associated transcription factor genes and their regulatory networks ...... 18 Pair-wise comparison of parents’ Time-0 transcriptomes ...... 19 Dominance accounts for a large majority of the differential expression among F1 progeny ...... 20 DISCUSSION ...... 21 Biosynthesis of secondary cell wall compounds as compensation for PLH injury ...... 21 Constitutive resistance or priming effects? ...... 22 High correlations between NBS-LRR R genes and PLH-resistance and sex ...... 23 CONCLUSION ...... 24 METHODS...... 25 Plant and pest materials ...... 25 Greenhouse no-choice feeding experiment ...... 25 Phenotype measurements ...... 26 RNA extraction and sequencing ...... 26 Calculation and quantification of gene expression abundance ...... 27

iv

Differential gene expression analysis ...... 27 Weighted gene correlation network analysis (WGCNA) ...... 27 Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis ...... 28 Inheritance of Gene Expression ...... 28 TABLES ...... 29 FIGURES ...... 31 SUPPLEMENTARY TABLE AND FIGURES ...... 39 CHAPTER 3 Comparative Metagenomics Reveals the Effects of Geography and Host Genotype on Willow Rhizosphere Microbial Community ...... 49

ABSTRACT ...... 50 INTRODUCTION ...... 50 Factors determining rhizosphere community composition ...... 51 Plant Growth Promoting Rhizobacteria (PGPR)...... 52 RESULTS ...... 54 Data characteristics and quality assessment ...... 54 Willow cultivation, geographic location and willow genotype all affect soil microbial community structure ... 55 Microbial diversity in the soil microbiome is influenced by willow planting and geographic location ...... 55 Relative abundances of the dominant taxa vary on geographic scale ...... 56 ‘Core Microbiome’ in willow rhizosphere ...... 57 ‘Functional Core’ maintained in stably-established willow trials ...... 57 Promising plant-growth-promoting-microbes associated with willow biomass yield ...... 58 DISCUSSION ...... 59 The impact of willow planting on dynamics of copiotrophic and oligotrophic phyla ...... 59 PGPRs and their application in willow breeding and growing ...... 60 CONCLUSION ...... 61 METHODS AND MATERIALS ...... 62 Field sites and plant material ...... 62 Soil Sampling and collection ...... 63 Soil DNA extraction and sequencing ...... 63 Sequence processing and Data analysis ...... 64 Microbiome comparison and statistical analysis ...... 64 TABLES ...... 65 FIGURES ...... 73 SUPPLEMENTARY TABLES AND FIGURES ...... 80 CHAPTER 4 Summary and Future Prospects ...... 88 Reference ...... 91

v

LIST OF FIGURES

Figure 2.1 Greenhouse phenotypic measurements of stem elongation rate and damage visual scoring ...... 31 Figure 2.2 PCA plot of all RNA-Seq libraries of the two parent genotypes at 4 time points ...... 32 Figure 2.3 Venn diagrams of up- and down-regulated differentially expressed genes (DEGs) separately identified via pairwise comparisons of PLH injured and uninjured libraries at all time points for each parent genotype .. 33 Figure 2.4 Co-expression analyses of all gene across all samples ...... 34 Figure 2.5 The internal regulations and connectivity network among genes in the darkgreen co-expressed gene cluster ...... 35

Figure 2.6 Overview distribution of log2 fold change of gene expression between parental species (S. purpurea vs. S. viminalis) by MapMan ...... 37 Figure 2.7 Significance of Eigengene-trait relationship correlations between leaf damage severity and co- expressed gene modules among F1 S. purpurea × S. viminalis progeny individuals ...... 38 Figure S2. 1 Hierarchical tree graphs of over-represented GO (Gene Oncology) terms in biological process categories for co-expressed genes in the darkgreen, magenta and black modules ...... 40 Figure S2. 2 Comparison of the plant hormone signal transduction initiations between two parent genotypes, as response to PLH attack ...... 42 Figure S2. 3 Chromosome-wide patterns of inheritance of gene expression in F1 S. purpurea × S. viminalis progeny individuals ...... 44 Figure S2. 4 Manhattan plot of genome-wide distribution of significance of sex-biased gene expression in F1 S. purpurea × S. viminalis progeny individuals ...... 46 Figure S2. 5 Distribution of inheritance classes of genes among F1 S. purpurea × S. viminalis progeny individuals ...... 47 Figure 3. 1 Geographic map of locations of the three willow trials in the Northeastern United States in this study ...... 73 Figure 3. 2 Principal coordinate plots for total metagenomic DNA data for all samples, generated using the Bray– Curtis distance on a) species level and b) functional level...... 74 Figure 3. 3 Boxplots displaying distributions of Shannon diversity indexes of each soil microbiome grouped by different geo-locations and collection time ...... 76 Figure 3. 4 A bar chart representation of microbiome community composition profiles at the phyla level for all soil DNA samples ...... 77 Figure 3. 5 Heatmap of relative abundances of core microbial genera that were observed in the shrub willow rhizosphere microbiome samples only at the time of harvest (vs. pre-planting) ...... 78 Figure 3. 6 Heatmap of the relative abundances of each of the gene GO functional attributes (at functional annotation KEGG pathway) across all shrub willow rhizosphere microbiomes ...... 79 Figure 3. 7 Heatmap of Pearson’s correlation values comparing abundances of the top 100 genera (x axis) with genetic or environmental variables ...... 79

vi

Figure S3. 1 Field Trial planting designs for Rock Springs (Panel A), Fredonia (Panel B) and Mylan Park (Panel C) ...... 81 Figure S3. 2 Bar plot of willow biomass yield of each of the 12 willow cultivars on 2 different geographic sites .. 82 Figure S3. 3 Taxonomic rarefaction curves for A) willow rhizosphere soil samples and B) pre-planting soil communities ...... 83 Figure S3. 4 Boxplot of distribution of Shannon diversity indexes of each willow rhiozosphere soil microbiome grouped by different geo-locations and host genotype...... 84 Figure S3. 5 Top 100 biomarker candidates that distinguished on relative abundance between Fredonia (Right) and Rock Springs (Left) ...... 85 Figure S3. 6 A bar chart representation of microbiome community composition profiles at the family level for all samples ...... 86

vii

LIST OF TABLES

Table 2. 1 Functional annotation of 10 hub transcription factors in the WGCNA darkgreen cluster II ...... 29 Table 2. 2 GO term enrichment of Differentially Expressed Genes between transcriptomes of parent genotypes S. purpurea 94006 and S. viminalis ‘Jorr’ at time 0 ...... 30 Table S2. 1 Inheritance of global gene expression patterns among all F1 S. purpurea × S. viminalis progeny individuals ...... 39 Table 3. 1 Results of permutational analysis of similarities (ANOSIM) tests using Bray-Curtis distances of the taxonomically annotated metagenomics data of each soil samples at the species level...... 65 Table 3. 2 The 11 genera showing highest correlation with willow biomass yield thare known to include species reported in previous studies tobe plant growth promoting rhizobacteria...... 66 Table 3. 3 Comparison of most common genera of microbes that were observed in among willow rhizosphere microbiomes and pre-planting soil microbiomes ...... 67 Table S3. 1 Pedigree metadata of twelve willow genotypes in this study ...... 81

viii

ACKNOWLEDGEMENTS

There have been a lot of people who have accompanied me and helped me through my PhD journey. I would like to express my gratitude and appreciation to all of them.

First and foremost, I would like to thank my advisor Dr. John Carlson. During my PhD study, Dr. Carlson was always being supportive and patient with my research and my self-development. He guided me to become a qualified researcher and cared about every progress I made. His passion and persistence in science had encouraged me all the time during my graduate study. I feel lucky to work with and be guided by him that leads to a rewarding and enjoyable time in my life.

I am also very thankful to my committee members, Dr. Yinong Yang, Dr. Mary Ann Bruns and Dr. Surinder Chopra. They gave me a lot of good suggestions and insights on my research and motivated me to pursue interesting questions in my study. Dr. Yang’s rich knowledge on plant defensive system helped me to better understand the interaction between pest and plant. He also gave me precious advice for my career planning and introduced me lots of job opportunities. Dr. Bruns was very instrumental in guarding me on my metagenomics project. She had always been prompt in answering my questions and giving me suggestions, which led me to think deeper about my findings. Dr. Chopera always brought up good suggestions and insightful thoughts during my committee meetings and directed me to ask interesting questions for the next step in my research. Here I also want to thank the Graduate Program of Plant Biology, and Dr. Teh-hui Kao, who recruited me to be part of our wonderful program at Penn State.

My research was supported by the Northeast Woody/Warm-season Biomass Consortium (NEWBio), which was funded by USDA National Institute of Food and Agriculture. It has been a great collaborative experience to work together with all researchers from different backgrounds.

I also want to thank everyone in our lab. They have been incredibly supportive and always being there to lend a helping hand.

At last, I want to thank all my friends and family for their constant love, support and encouragement, which keep me forward in my study and in my life.

ix

CHAPTER 1

Introduction to the Shrub Willow and Related Research

1

1.1 Introduction to Energy Crop

1.1.1 Global energy crisis and bioenergy market

According to the Energy Monthly Review by US Energy Information Administration (EIA), global energy demand is growing rapidly since 1950 and this demand is mostly met by fossil fuels, like petroleum and coal (EIA, 2017). However, with growing populations and continual development of industrialization, energy demand is expected to double or perhaps triple during this century. Thus, security of energy supply is a global issue and it is vital to develop and employ energy from sustainable resources. The “new” renewables (e.g. solar, wind, and biofuel) have been growing fast from a very low base. Biomass, derived from organic matter like field waste, residues and bioenergy crops, can be converted into renewable energy in various ways – directly used for heating or for electricity production, or converted into gaseous or liquid fuels (IEA, 2017). Bioenergy currently provides roughly 5% of the global energy supplies and accounts for roughly 50% of the energy derived from renewable resources. Since 2001, there is a significant increasing trend in usage of renewable energy – and among all the available renewable energy resources, biomass is playing a leading role, with around 2.5% growing rate per year since 2010 (IEA, 2016). The US Department of Energy (DOE) has developed a National Biofuels Action Plan (https://www.afdc.energy.gov/pdfs/nbap.pdf, accessed 01/03/2018), with a national goal of reaching 36 billion gallons of biofuel production per year by 2022. In order to meet these goals in an environmentally sound and sustainable way, a dramatic increase in the total production of biofuel feedstocks on limited land is required (Richard 2010).

1.1.2 Biomass as “carbon neutral” energy source

The sharp increase of fossil fuel use in transportation, agriculture and forest, industry, electricity production, commercial and residential brings a severe consequence of global climate change. The combustion of petroleum and coal release large amount of carbon dioxide into atmosphere, causing a significant rise of carbon dioxide concentration from less than 320 ppm (parts per million) in 1960s to over 400 ppm until 2015 (Fulton et al., 2015). The consequence of rising temperature could result in change of landscapes and wildlife habitat, rising seas and weather disaster and high risk of diseases. As the atmospheric level of CO2 increasing and evidence of climate change accumulating, we should anticipate more strict controls on greenhouse gas emission in the near future and the necessity to replace fossil fuels with the use of renewable biomass as an energy feedstock. Biomass is our only renewable

2

source of carbon-based fuel, considered as “carbon neutral”, because no extra carbon dioxide is released into the environment, based on the assumption that if we replant the bioenergy crops as soon as they’re harvested and “burned”, the re-planted crops will take in the carbon dioxide released in the atmosphere via combustion, maintain the carbon cycle in balance. Since under the current circumstance, nothing offsets the carbon dioxide that fossil fuel burning produces, use of biomass instead of fossil fuels will reduce the greenhouse gas emission. Besides, a recent study of land-use change from grassland to short rotation coppice willow shows significantly reduced GHG emissions (Harris et al, 2016).

1.1.3 Dedicated energy crops

Cellulosic biomass provides a sustainable low-cost alternative of liquid fuels to petroleum use, alleviating issues on energy security, environmental pollution and economic prosperity. Among biomass components, the polysaccharides, as cellulose and hemicellulose, which comprise the majority of biomass can be hydrolyzed to sugars for fermentation to ethanol, and the lignin can be burned to supply the bio-heat and bio-power. Currently, the primary feedstock of ethanol produced as biofuel in the U.S. is corn, taking up around 33 million acres of cropland. Additionally, corn also requires more management inputs, including (pest and disease control, fertilization, etc.), which diminish the energy return on investment (EROI). In contrast, the second generation lignocellulosic bioenergy crops show significant advantages to replace the traditional petroleum energy and the 'first generation' biofuels, which are produced from food. Those second generation bioenergy crops, which are planted to harvest biomass exclusively for the use of bioenergy, are named “dedicated energy crops (DECs)” crops and thought as a renewable and sustainable energy source. They includes short-rotation woody perennials such as poplar (Populus) (Tuskan et al., 2004; Tuskan et al., 2006) and shrub willow (Salix) (Smart et al., 2007; Smart and Cameron, 2012), as well as perennial grasses, like switchgrass (Panicum virgatum) (Mitchell et al., 2012) and miscanthus (Heaton et al., 2008) for lignocellulosic ethanol, and perennial oilseed species like jatropha (Jatropha curcas L.) (Openshaw 2000) and algae (Scott et al.), for biomass-derived biodiesel production. These perennial feedstocks have several advantages over annual crops like corn: 1. Perennials requires less annual investment for planting and establishment; they produce more biomass with low management input and harvest cost; they benefit the environment on stabilizing soil, reducing soil erosion, improving water quality and providing wildlife habitat. However, it is unlikely that any single species will be a universal feedstock for the biofuel industry, a dedicated energy crop must be evaluated as adaptable to the climate, profitable and acceptable to local farmers, biorefineries and market before largely promoted as a bioenergy feedstock (Mitchell et al., 2016). In Europe, willow is widely planted in the northern area,

3

whereas in the south Europe, poplar coppicing is of greatest interest. In North America, willow adapts well to the cool, moist climates in the Northeastern United States and across southern Canada, and grows pretty fast, compensating for the short growing seasons of this area. The northern Asia countries, including China, Japan, and Russia, have abundant genetic resources of willows, providing a large-scale material basis for bioenergy development. Many areas have shown great interests and prospects for deploying willow coppicing (Shield et al, 2015).

1.2 Biology of the Shrub Willow

1.2.1 Ecology, population and genetic Structure

Salix and Populus are two genera of the family . str. of the order Salicales, class Magnoliopsida, subclass Dilleniidae (Karp 2014), both of which are of interest for biofuel and bioenergy production by growing them by growing them under short rotation coppice (SRC) system (Kuzovkina et al., 2008). Salix is hard to classify on the species level due to its phenotypic variability and plasticity, interspecific hybridization and polyploidy (Kuzovkina et al., 2008). Salix genus is comprised of 450 - 520 species of trees and shrubs and their habits cover from temperate to arctic regions of the Northern Hemisphere (Argus, 2010). The authoritative of Salix based on morphology has divided the genus into 4 subgenera: Salix, Longifoliae Andersson, Vertrix Dum, and Chamaetis Nasarow (Argus 1997, 1999; Skvortsov 1968, 1999). Subgenus Longifoliae only contains several species originated exclusively from Americas (New World species), represented by S. exigua with morphological traits of stomata on both lower and upper leaf surface. Subgenus Salix is comprised of mostly tree-type species, similar to Populus; they are not widely used in commercial cultivation due to the hybridization barriers between Salix and Vertix. Subgenus Vetrix is of the comprised of the greatest number of species in Salix genus (more than ⅔) and most exploited for commercial breeding due to the propensity for fast growth of most species in this genera. Most Salix species are diploid of 2n=38, but ploidy levels vary from diploid to dodecaploid (Fogelqvist et al., 2015). Interspecific hybridization of both naturally happening or through breeding occurs readily, at least at the sub-genera level. However, most species of the subgenus Salix do not hybridize with species from other subgenera. The hybrids usually show heterosis, with enhanced performance than the parents (Kuzovkina et al., 2008). Diversity in willow genetic resources provides potential for developing improved cultivars with desirable traits like increased yield, lower incidence of rust infection (Melampsora spp.), high water use efficiency, and better adaptation to marginal land (Stanton et al., 2014).

4

Primarily willow cultivation is conducted for the use for basketry, weaving and also for river banks and waterways (Kuzovkina et al., 2008), however, most recently willow has shown promise as a sustainable source of biomass production in the bioenergy, biofuel industries. In 2009, an estimation of willow plantations in UK has reached 7,400 hectares (Easson et al. 2011), among which England makes up the largest part of the planting area. In Sweden, willow planting covers 16,000 hectare by 2000 (Larsson et al., 2003). In the Northeastern United States, by 2008, there are more than 400 hectares of willow grown (Shield et al., 2015).

1.2.2 Commonly studied Salix species

S. viminalis is the most common willow species that widely distributed in Europe and propagated both by seeds and by cuttings. It conveys a lot of valuable traits for biomass production, so it’s commonly hybridized with other species for developing new varieties. S. schwerinii is a Salix species with strong resistance against a common pathogen Melampsora larici-epitea on willow, a major threat to willow cultivation, which causes severe leaf rust and leads to heavy damage on biomass yield (Samils et al., 2011). Both and Salix schwerinii are widely used in breeding to develop commercial cultivars with a mix of S. viminalis or S. schwerinii genetic backgrounds, conferring the beneficial traits of fast-growing and rust-resistance, adapted to growing environment in Sweden and northern Europe. In the US, S. purpurea is popularly used in breeding, to introduce resistance to a pest potato leafhopper, as well as two other species S. eriocephala, S. miyabeana, which are also widely used (Kuzovkina et al. 2008). S. eriocephala, a native species of North America, were spotted for its extensive spread across North-South distribution and its early autumn senescence in order to survive the cold winter in such area. The morphological trait of high biomass suggests its potential for cultivar commercialization (Lauron- Moreau et al, 2013).

1.2.3 Genomics studies on willows

Before there was much genetic information for shrub willow, the traits of interest, like stem height and diameters, stool (sprout) numbers, plant width and various environmental stresses as pest, disease, drought, were studied and selected only by phenotypic measurements. Crosses generated hybrid progeny showing large variation in phenotypes. Through the phenotypic screening and recurrent selection approach, the traditional breeding programs achieved great progress on increased willow yield by 60%,

5

low incidence of leaf rust and pest damage (Kuzovkina et al. 2008). Genetic studies of quantitative trait loci (QTL) mapping of valuable traits, genetic linkage mapping and genome sequencing of shrub willow (S. purpurea) have been done since 1990s (Karp 2014). QTL analysis is a classical method to explain the genetic basis of quantitative variation in phenotypic traits, linking them with different loci of chromosomes (Lander and Botstein 1989). The genetic markers located within the identified loci can be used in marker-assisted selection (MAS), a faster and convenient breeding approach. QTL mapping approach revealed a single locus with major resistance and several loci with minor effects for leaf rust resistance (Hanley 2003; Tsarouhas et al. 2003). Besides, several other QTL mapping populations have been made in UK and US and identified QTLs for herbivory pest resistance (Rönnberg-Wastljung et al. 2006), growth traits (Rönnberg-Wästljung et al., 2005), freezing resistance and phenology (Tsarouhas et al. 2004), water-use efficiency and drought tolerance (Rönnberg-Wastljung et al. 2005; Weih et al. 2006). In addition to these yield-related traits, other traits as regards to application in bio-energy industry are also been considered. The composition and proportion of carbohydrates within biomass can affect the thermochemical conversion. High cellulose and hemicellulose and low lignin contents are targets of willow improvement for more efficient enzymolysis and fermentation process (Lee et al, 2012; Ray et al, 2012). Factors of ash or moisture content of willow biomass could also affect the efficiency of energy conversion from feedstock or damage bioenergy boilers. A consistent level of biomass traits provides a steady, predictable base for the bioenergy production pipeline. Besides, high level of inorganic compounds of S and N in willow biomass chips can increase the fraction of air pollutants of SOx, NOx in combustion exhausts (Karp 2014).

Due to the close relationship on phylogenetics between genus Populus and Willow, at the beginning, genetic information of popular genome was used as basis to fix QTLs position and construct willow genetic map. Genetic linkage maps of willow (S. viminalis and S. leucopithecia x S. erioclada L) have been created based on AFLP and SSR markers (Hanley et al. 2002; Hu et al. 2011), as well as SNP markers via genotyping-by-sequencing (GBS) technology (Elshire et al 2011). Furthermore, using next generation sequencing technology, the whole genome sequencing of shrub willow have been finished for S. suchowensis (Dai et al., 2014) and S. purpurea L. (“salix purpura v1.0, DOE-JGI”, 2015). The availability of Salix genome sequence will assist to develop low cost, high-throughput, and very dense whole-genome markers for genetic mapping and genomic selection via method of genotyping by sequencing (Myles et al. 2010). Knowledge of the genetic diversity is a prerequisite to breeding programs aimed at improving biomass production. Use of genome-wide molecular markers as direct predictors of phenotypic trait saves time for phenotyping screening and trait selection, especially those traits expressed at later developmental stages, like biomass yield. The accrued genetic and genomic resources of shrub

6

willow as a result of extensive work not only expedited willow breeding for improvements on growth and adaptation but also facilitate biological understanding of woody .

1.3 Use of the Shrub Willow in Energy Industry

Willow began to gain notice as a short rotation coppice (SRC) feedstock for renewable bio-energy in the 1980s, under the context of fuel costs surging (Karp 2014). It was bred firstly in UK (Lindegaard & Barker, 1997) and Sweden (Larsson, 1998), and then Northeast willow breeding program at State University of New York (SUNY) started to breed shrub willow in the United States in 1990s, generating a collection of natural willow species and controlled hybrids, to identify and exploit species and individual genotypes that display traits well-suited for environmental conditions in the Northeastern US (Kopp et al., 2001; Smart et al., 2005; Smart & Cameron, 2012). In past decades, research and plantation experiments have demonstrated the potential growth of shrub willows in Northeast and Midwest America (Tolber & Schiller 1996; Tolbert& Wright 1998; Volk et al. 2006; Wojnar & Rutzke 2010). Perennial woody crops primarily grow better on the rocky and sloped soils, typical in the Northern U. S., and tolerate the wet springs and occasional summer drought, compared to other DECs. Shrub willow has multiple advantages as a biomass feedstock: short harvest cycle, high yield and adaptability to a wide range of site conditions, high net energy ratio, low demand for fertilizer and management and favorable environmental impact (Cameron et al., 2008; Smart and Cameron 2012). Biomass yields from shrub willow could reach to 10 - 15 dry Mg ha-1 yr-1, far ahead of other biomass crops within the temperate zone. Willow plantations are established using willow stem cuttings and harvested on a 2 to 4-year harvest cycle. The post-harvest coppicing reinvigorates the growth, maintaining sustainable production for more than 20 years (Shield et al. 2015). During their perennial cycle, the shrub willow redistributes their nutrients, thus require little fertilization (Shield et al. 2015). Shrub willows are mostly grown in marginal agricultural land or reclaimed mine land, which is unsuitable for commodity crops plantation, making them an economically competitive crop. For the breeding perspective, shrub willow is characterized by high genetic diversity and low domestication level, providing abundant genetic resources for developing desirable traits as a feedstock for sustainable bioenergy production, with both conventional breeding method and advanced genetic engineering (Karp & Shield 2008; Karp et al., 2011). Another advantage in breeding is that it only takes 2 years from a seed to flower, relatively much faster than usual tree breeding. Also, relatively small genome and efficient vegetative propagation by cuttings facilitate the breeding process. However, on the other side, due to the high genetic diversity and high polyploidy, and incapability of selfing, it’s hard to keep prepotency of desirable traits and it’s time-consuming to measure traits especially for those

7

expressed at later stages, like biomass yield. Identification of molecular markers associated with beneficial traits and application of marker assisted selection (MAS) at the nursery stage could advance willow breeding and save time for phenotyping screening and trait verification (Karp et al. 2011).

Despite the superior properties of shrub willow as an energy crop, they are not widely planted and the production system for biofuel is not broadly adopted due to the immature supply chain from plantation to end users (Buchholz & Volk 2011). Developing short rotation shrub willow variants with high yield, pest- resistance and adaptability to marginal agricultural land environments is essential to the blooming of the new-generation bioenergy.

1.4 Use of Shrub Willow in Other Environmental Projects

The development of willow production system not only contribute to the willow breeding program, but provides a basis for the use of shrub willow in other environmental projects like phytoremediation. Phytoremediation is the practice to use plants to clean up hazardous contaminated environment, including soil, air, and water (Reichenauer and Germida, 2008). Shrub willow is featured by its ability for metal absorption and water/soil purification. Planting shrub willow on heavy metals-contaminated soil or reclaimed mine land will help to mitigate the soil pollution and improve water quality, substantially restoring and recovering arable land. As well, large root system on rocky land can help to establish riparian buffer zones, prevent soil erosion and sequestrate carbon source belowground.

8

CHAPTER 2

Transcriptome Analysis of Contrasting Resistance to Herbivory by Empoasca fabae in Two Shrub Willow Species and Their Hybrid Progeny1

Wang, W.1, Carlson, C.H.2, Smart, L.B.2, and Carlson, J.H1

1Ecosystem Science and Management, Pennsylvania State University, University Park, PA 16802

2Horticulture Section, School of Integrative Plant Science, Cornell University, Geneva, NY 14456

1 Chapter 2 is currently being prepared for publication was reformatted from a manuscript with co-authors. This work was in collaboration with Dr. Larry Smart and his PhD student Craig Carlson. The majority work including RNA extraction, sequencing data analysis, manuscript composition was done by me.

9

Abstract

Shrub willow (Salix spp.), a short rotation woody biomass crop, has superior properties as a perennial energy crop for the Northeast and Midwest US. However, the insect potato leafhopper Empoasca fabae (Harris) (PLH) can pose a serious threat to shrub willow productivity. At present, use of resistant cultivars is the optimal strategy for pest control. Bioenergy willow cultivars currently in use display varying levels of susceptibility towards PLH infestation. However, genes and markers for resistance to PLH are not yet available to assist in breeding. In this study, transcriptome analysis was conducted on progeny selected from a cross of two shrub willow species with contrasting response to PLH (resistant S. purpurea genotype 94006 and susceptible S. viminalis cultivar ‘Jorr’) after PLH pest challenge. Over 600 million RNA-Seq reads were mapped to the Salix purpurea reference transcriptome. Gene expression analyses revealed specific defenses in resistant genotype 94006, including PLH-induced secondary cell wall modification. In the susceptible plants, genes involved in programed cell death were highly expressed, which may account for pest-derived symptoms such as necrosis, leaf curling and early leaf drop. Overall, the identified candidate resistance genes and defense mechanisms provide new resources for shrub willow breeding and research.

Abbreviations

PLH: Potato Leafhopper

DEG: Differentially Expressed Gene

WGCNA: Weighted Gene Co-expression Network Analysis

GO: Gene Ontology

KEGG: Kyoto Encyclopedia of Genes and Genomes

QTL: Sex quantitative trait loci

TF: Transcriptional Factor

TAIR: The Arabidopsis Information Resource

10

Introduction

The increasing worldwide demand for energy, together with the rise in atmospheric greenhouse gases and accumulating evidence of climate change, demand developments in renewable energy sources. Shrub willow (Salix spp.), a short rotation woody biomass crop, has great potential for use as a renewable and sustainable bioenergy source, to replace traditional petroleum energy and 'first generation' biofuels produced from food crops (Volk et al., 2006). Shrub willow has multiple advantages as a feedstock for biofuels and bioproducts. It is fast-growing with a short harvest cycle; it has a high biomass yield, high net energy ratio, and relatively low demand for fertilizer and management inputs (Stoof et al., 2015). It also can adapt to harsh field conditions, like underutilized or marginal agricultural land (Smart and Cameron, 2012), and it has favorable environmental impacts such as soil remediation and land reclamation (Kuzovkina and Quigley, 2005).

Salix and Populus, which comprise the family Salicaceae s. str., share large collinear genetic maps (Hanley et al. 2006). There are 450 - 520 species of Salix worldwide, which mainly grow in temperate and arctic areas in the northern hemisphere (Argus, 2010). The genus Salix has high levels of genetic diversity from individual polymorphisms through interspecific hybridization and polyploidy. Most Salix species are diploid of 19 (2n=38), but ploidy levels vary from diploid to dodecaploid (Fogelqvist et al., 2015). Diversity in willow genetic resources provides potential for developing improved cultivars with desirable traits like increased yield, lower incidence of rust infection (Melampsora spp.), high water use efficiency, and better adaptation to marginal land (Stanton et al., 2014).

Among all the threats to shrub willow growth, the potato leafhopper (PLH), Empoasca fabae (Harris), an insect pest in the eastern and midwestern US and parts of eastern Canada (Lamp et al., 1991; Fick et al., 2003), causes severe damage to shrub willow. PLH populations originate in the Gulf Coast and the southeastern US and then migrate up to the northeastern states, where their arrival time and development rate are affected by weather conditions and host species availability (Taylor and Shields, 1995). The hosts of potato leafhopper include over 220 species of plants in 26 families (Chasen et al., 2014), but the primary host is a field crop alfalfa, as well as potatoes and legumes, which they pose a significant agricultural threat on (Lamp et al., 1994). Populations of PLH on shrub willows usually appear during mid- to late-June and peak in early July, varying according to the time of arrival and current temperature. Under ideal weather conditions, PLH populations propagate quickly and reproduce up to six generations during one growing season. Both adults and nymphs can cause damage to willows via their lacerate-and- flush feeding behavior (Kabrick and Backus, 1990), which consists of continuously rupturing cells (80% of its probing time), secreting watery saliva, and ingesting the plant fluids from phloem tissues (Ecale and

11

Backus, 1995a). Studies on alfalfa (Medicago sativa L.) plants injured by PLH identified a cascade of anatomical changes on stem vascular tissues within a few minutes of laceration (Ecale and Backus, 1995b; Zhou and Backus, 1999). The damage starts with the rupturing, crushing, and later blockage of phloem cells (Ecale & Backus, 1995b), along with increased cell division with atypical planes and development of wound phloem transfer cells, which are similar to callus tissue (Zhou and Backus, 1999). The disorganization of vascular bundles causes symptoms on plants called “hopperburn”, which is a sequence of abnormal states: tip dehydration and wilting, leaf chlorosis, early leaf drop, as well as internodal growth restriction and consequential stunting of growth (Backus et al., 2005). The most direct and serious harm of PLH to biomass yield is the stunted plant growth (Kopp, 2000) and weakened plant defenses, especially under other stress conditions such as drought. Shrub willow genotypes vary greatly in their susceptibility to PLH. Some genotypes are particularly sensitive, with entire loss of yield. Salix viminalis, which originates from Europe, and its derived cultivars show particular susceptibility to the potato leafhopper (Labrecque and Teodorescu, 2005; Smart and Cameron, 2008), while hybrid crosses of S. viminalis with S. miyabeana and S. purpurea display varying degrees of resistance (Gouker and Smart, 2015; Fabio et al., 2016). However, despite its susceptibility, S. viminalis harbors desirable physiological traits for biomass production, which makes it still popular among willow breeders and growers (Karp et al., 2011). Chemical insecticides are used to control PLH on other host crops (Huseth al., 2014), whereas no agrochemical management is currently deployed on shrub willow due to the extra cost. Thus, development of resistant cultivars of shrub willow crops is required for larger scale commercial deployment.

Advances in molecular biology and high-throughput sequencing techniques greatly facilitate the exploration of differential expression of the willow transcriptome by sensitive and resistant genotypes under PLH stress. This study focused on analysis of the transcriptomes of two willow species, S. purpurea and S. viminalis, with contrasting PLH-resistance and their F1 full-sibling progeny, which display variable levels of susceptibility to PLH. This research provides deeper insights into the defense mechanisms and potential key processes and genes that determine cultivar-specific resistance of shrub willow to PLH attack. Hopefully these results will help guide the future selection of tools for breeding improved cultivars for the biofuel industry in the northeastern region of the US.

12

Results

Phenotypic responses of shrub willow to PLH attack in greenhouse and field trials

In a greenhouse no-choice feeding trial, symptoms of PLH feeding could be observed on some plants within hours. Leaf curling started before the first tissue collection point at 6 h. Exposure to PLH feeding also resulted in differences in stem elongation over the 11 d period (Figure 2.1A). The resistant parent, S. purpurea 94006, added a mean of 28.3% in total stem length per plant over the 11 d of the experiment, while the susceptible parent, S. viminalis ‘Jorr’, increased only a mean of 7.1% in total stem length. The percent change among progeny varied from 14.1% for genotype 11X-407-085 to only 2.5% for genotype 11X-407-089, with an overall mean of 7.6% for all hybrid progeny. There were also differences in damage severity (shoot tip and young leaf necrosis) by genotype (Figure 2.1B), with a mean scaled percentage of 4.5% on 94006 and 16.7% on ‘Jorr’. The damage severity among the progeny varied from 3.5% for genotype 11X-407-059 to 27.5% for genotype 11X-407-069, with a mean among all progeny of 12.1%. There was a significant negative relationship (p=0.0004, R2=0.16) between the stem elongation rate and the severity of damage.

The variation among plants in response to PLH attack observed in the greenhouse no-choice feeding trial was generally consistent with performance in the field trials where PLH adults could freely make host choices. Survey data from mid-season 2014 and 2015 field trial of 100 hybrid progeny were used to select 18 hybrid progeny that represented the full range of susceptibility, scored as shoot tip necrosis, leaf curling, and stem height. There was a significant positive correlation (P<0.0001, R2=0.24) between height measurements in the field in 2015 and stem growth rate in the greenhouse no-choice feeding trial. There is also a significant negative correlation (P=0.0001, R2=0.18) between shoot damage severity scored in the field in 2014 and stem growth rate in the greenhouse. Based on the damage survey and growth measurement results, we selected seven progeny that covered a wide range of susceptibilities for transcriptome sequencing, along with their parents, 94006 and ‘Jorr’, as controls.

RNA Sequencing and quality assessment

We extracted total RNA from 96 shrub willow leaf samples, including 2 or 3 replicates of each treatment condition (4 time points * 9 genotypes). RNA sequencing of the 96 mRNA samples on the Illumina HiSeq2500 platform yielded a total of 612,869,652 paired-end reads with length of 101 bp. Library sizes

13

ranged from 4,266,920 to 9,159,176 reads, with a mean of 6,384,059. Clean reads from trimming and filtering were then mapped to the S. purpurea primary transcript sequences (Salix purpurea v1.0, Phytozome 12.0 DOE-JGI) using Bowtie2 version 2.2.4 (Langmead & Salzberg, 2012). Mapping rates ranged from 68.99% to 81.67%.

Differentiation of parent transcriptomes based on genotype and defense-response timing

A principal component analysis (PCA) of the transcriptomes from all of the parents RNA-Seq libraries, including biological replicates at each time point, revealed an underlying structure to expression profiles that separated samples by both genotype and treatment (Figure 2.2). The PC1 dimension, which explains 73% of total data variance, separated all libraries into two groups based on parent genotypes, suggesting that the genomic background strongly determined (73% of the total variance) differences in the transcriptome. In the PC2 dimension, for both parent genotypes, RNA-Seq data at time point 6 h was vertically separated from all other time points, indicating that the largest plant defensive response occurred within 6 h, which is in concordance with the phenotypic observation that leaf curling appeared before the first sampling point 6h after exposure to potato leafhoppers. Another noticeable pattern in the PCA plot was that the RNA-Seq libraries of the resistant parent, 94006, were more dispersed than for the susceptible parent ‘Jorr’. This indicates that herbivore resistance in 94006 may in general involve a greater degree of modulation of gene expression than can occur in sensitive genotypes in response to PLH attack.

Identification of differentially expressed genes between parents from RNA-Seq data

The identification of differentially expressed genes (DEG) was implemented with R package DESeq2 (Love et al. 2014) via pairwise comparisons of the transcriptomes at different time points relative to time point 0, as the non-treatment control. DEG discovery was based on criteria of false discovery rate (FDR) < 0.05 and expression fold-change > 2. Three sets of DEG (time 6 h vs time 0; time 24 h vs time 0; time 96 h vs time 0) were called separately for both parent genotypes and each set was further split into up- and down-regulated groups. Combining the DEG results for both parents, a total of 6,983 non-redundant DEG (genes belonging to multiple DEG lists were only counted once) were identified from the six pairwise comparisons. Venn diagrams (Figure 2.3) show relative amounts and overlaps among sets of DEG identified at different time points, with up- and down-regulated genes shown separately for the two

14

parents. As shown in Fig. 3, the largest change in gene expression occurred by the 6 h time point for both parents. However, there were more DEG at hour 6 in 94006 (4493) than in ‘Jorr’ (1670), both in terms of total number of DEG and relative to the 24 h time points for both up- and down-regulated genes. Additionally, the Venn diagram illustrates that greatest differential gene expression changes for genotype 94006 happened at time 6 h, whereas the gene expression changes in ‘Jorr’ increase less and were at about the same levels at time 24 h and 6 h, suggesting a more sensitive response and more rapid initiation of the defensive processes in resistant 94006.

Weighted gene correlation network analysis (WGCNA) identified three clusters of genes associated with specific resistance mechanisms in the parents and hybrid progeny

In general, genes that share a synchronized behavior, so called co-expression, are likely to be functionally associated and/or dependent, because they belong to common pathways or expression networks. In co- expression analyses using the weighted gene correlation network analysis (WGCNA) approach, genes with tightly correlated patterns of expression are identified and extracted to generate a network associated with specific biological processes. Based on the time course RNA-Seq dataset for all nine genotypes (2 parents and 7 hybrid progeny), WGCNA detected 25 clusters of genes with a stringency threshold of 0.75. To interpret the biological function of each gene cluster, WGCNA then tested correlations of gene clusters with the phenotypic measurements taken during both the greenhouse and field trials for several traits, including visual scores for PLH shoot damage, necrosis, and leaf curl at different time points, and stem elongation rates calculated from height data, which were averaged to represent pest susceptibility (see methods). For each cluster, WGCNA calculated an eigengene value to summarize the overall expression profile of the cluster, and then correlated each eigengene expression pattern with the phenotypic traits, generating a table of cluster-trait relationships to identify the most significant correlations (Figure 2.4B). In this table, three clusters (darkgreen, magenta, black) displayed consistently strong (positive or negative) correlations with all phenotypic traits, indicating their potential functional linkage with PLH-resistance and growth traits. The cluster in the bottom row of the table (grey) is a group of genes that cannot be assigned to any other clusters. The light color of the cells within this row suggests that correlations between the grey cluster and each phenotypic trait are weak, and thus can be considered as a negative control to validate the accuracy of the module -trait relationships.

15

Functional annotation enrichment analysis of gene cluster I (darkgreen)

The genes from the three clusters of interest (darkgreen, magenta and black, identified above as being strongly correlated with phenotypic traits), were submitted to AgriGO online toolkits for gene ontology (GO) enrichment. A total of 885 genes were grouped in the darkgreen cluster. The eigengene for this cluster was highly positively correlated with stem growth and negatively correlated with PLH damage severity, suggesting that these genes are associated with plant growth and pest resistance. The eigengene expression profile (Figure 2.4 C) for the parent genotypes revealed a trend of increasing expression over time after the pest infestation, and distinct differences in gene regulation, with induction of gene expression in the resistant parent 94006 versus suppression of expression in the susceptible parent ‘Jorr’. These distinct expression patterns suggest that the genes in this cluster are likely to be involved in defensive processes occurring in S. purpurea 94006.

Functional annotation of this set of genes includes significantly enriched GO terms in the biological process categories of ‘cell wall biosynthesis’ (GO:0042546, FDR: 1.9e-14), ‘aromatic compounds biosynthetic process’ (GO:0019438, FDR: 0.00027), ‘phenylpropanoid metabolic process’ (GO:0009698, FDR: 0.00059), and ‘response to stimulus’ (GO:0050896, FDR: 0.011) (Supplemental Table 1). There are in total 15 enriched GO terms (FDR < 0.05) related to ‘cell wall biological process’ (GO:0042546, FDR:1.7e-06). Specifically, the three top GO terms - ‘secondary cell wall biogenesis’ (GO:00009832, FDR: 1.3e-09), ‘xylan biosynthetic process’ (GO:0045492, FDR: 4.7e-07) and ‘hemicellulose metabolic synthesis’ (GO:0010410, FDR: 9.7e-07) - highlight the role of the secondary cell wall metabolism, specifically cellulose, hemicellulose, and lignin synthesis, in plant defense strategies of the resistant genotype 94006. The ‘cellular aromatic compound metabolic process’ (GO:0006725, FDR: 0.00021) and ‘aromatic compounds biosynthetic process’ could also indicate lignin biosynthesis, since lignin is an aromatic polymer that mainly deposits during secondary cell wall thickening where it provides strength and rigidity. Secondary cell wall thickness in maize was found to be greater in genotypes resistant to obstruct mechanical rupture by corn borers within pith tissues (Santiago et al., 2013). In addition to lignin, phenylpropanoids and aromatic compounds and their derivatives are also precursors of the biosynthesis of flavonoids and condensed tannins, which have important biological functions in both abiotic and biotic stress defenses (Pascual et al., 2016). As well, phenolic secondary metabolites play pivotal roles in plant chemical defense as antifeedants and toxins (Heldt and Piechulla, 2011). Finally, in cluster I, 105/885 genes were assigned to the ‘response to stimulus’ (GO:0050896; FDR:0.011) biological process, indicating that PLH attack is perceived and signal transduction and basic defense responses are initiated in response to mechanical damage from chewing, or components of PLH saliva.

16

Functional annotation enrichment analysis of gene cluster II (magenta)

There are 859 genes grouped in cluster II (magenta), for which the eigengene expression profile does not, in general, show a strong fluctuation over time after PLH treatment (Figure 2.4D). However, the expression of genes in cluster magenta in resistant genotype 94006 is constitutively higher compared to ‘Jorr’. The top most-represented GO term in the biological process category was ‘response to stimulus’ (GO:0050896, FDR: 2.2e-07), which included 127/859 genes. Among the 127 response to stimulus genes, 51 were assigned to the child GO term ‘response to abiotic stimulus’ (GO:0009628, FDR:0.001) and the other 53 were assigned to two other child GO terms - ‘response to other organism’ (GO:0051707, FDR:0.0051) and ‘response to biotic stimulus’ (GO:0009607, FDR:0.0051) (Table S2.2). Among the other enriched GO terms in cluster II, some were associated with plant signaling perception and transduction, such as ‘hormone-mediated signaling pathway’ (GO:0009755, FDR:9.10e-05) and ‘multidrug transport’ (GO:0006855, FDR:7.80e-05). The KEGG pathway mapping of cluster II genes identified the most enriched pathway as ‘tropane, piperidine and pyridine alkaloid biosynthesis’ (ko:00960, p- value:0.0027). Alkaloids, derived from amino acids metabolism, are known to be anti-herbivory secondary metabolites. Production of various alkaloids suggests the host plant uses them as a chemical defensive barrier. Notably, three other KEGG pathways associated with signal transduction are also enriched: ‘other glycan degradation’ (ko:00511, p-value: 0.0127), ‘fatty acid degradation’ (ko:00071, p- value: 0.0159) and ‘ABC transporters’ (ko:02010, p-value:0.031). Cell wall released free glycans act as signals that initiate plant defense response through recognition by receptors on plasma membrane (Etzler and Esko, 2009). Fatty acids (FAs) and their derived metabolites, which are released from membranes after triggered by environmental stimuli, function as second messengers and modulators of the plant innate immune system (Walley et al., 2013). Plant ABC transporters contribute to the transportation of plant endogenous defensive secondary metabolites like alkaloids, terpenoids, polyphenols, quinones, etc. (Yazaki, 2005). As well, fatty acid oxidation leads to the biosynthesis of jasmonic acid (JA) and salicylic acid (SA), two phytohormones vital in regulating plant defenses against biotic stress (Kang et al., 2011). Furthermore, two Rad genes (Rad50, Rad27) were mapped to the KEGG pathway ‘non- homologous-end joining (NHEJ)’ (ko03450, p-value:0.022). Double strand DNA breaks (DSBs), which can be induced by both endogenous agents and environmental elicitors, are repaired mainly via the non-homologous end joining mechanism (Gorbunova, 1997). Studies detected DSBs in Arabidopsis after pathogen infections, which revealed the interconnection between DNA damage repair and plant immune resistance (Song and Bent, 2014).

17

Functional annotation enrichment analysis of gene cluster III (black)

The cluster III includes 685 genes for which WGCNA indicated that they negatively correlated with stem growth and positively correlated with plant susceptibility (reflected as symptoms like necrosis, leaf curling and shoot damage). The eigengene expression pattern (Figure 2.4E) shows lower expression levels generally in the resistant parent 94006, but higher expression in the susceptible ‘Jorr’, as well as a general increase over time after PLH treatment. Gene ontology enrichment analysis for cluster 3 genes identified GO terms that are highly enriched in the biological process category of ‘death’ (GO:0016265, FDR:2.90e-08), ‘cell death’ (GO:0008219, FDR:2.90e-08), ‘cell programmed death’ (GO:0012501, FDR:1.10e-08) and ‘apoptosis’ (GO:0006915, FDR:1.70e-06), indicating these genes may account for pest damage symptoms such as necrosis, leaf curling and leaf abscission in the susceptible genotype due to programmed cell death in the leaves. Another enriched GO term is ‘defense response’ (GO:0006952, FDR:1.30e-05), which includes 32 genes. Those defensive genes were highly expressed in ‘Jorr’ specifically, but were expressed at low levels in 94006, which indicates that the susceptible parent ‘Jorr’ had specific defensive responses after PLH attack that were insufficient to protect it against PLH.

PLH-resistance associated transcription factor genes and their regulatory networks

Transcriptional regulation of gene expression under stress conditions is pivotal to plant defense response (Rushton and Somssich, 1998). Transcription factors (TFs) temporarily and spatially regulate the expression of their target genes via binding to cis-elements. To detect the master regulators within the resistance-related genes in the darkgreen cluster, the plant transcription factor database (PlnTFDB) (Perez-Rodriguez et al., 2010) and PlantRegMap (Jin et al., 2015) were screened for regulation prediction and functional enrichment analyses via mapping the input genes to the curated Arabidopsis transcription regulatory interactions. Among the 885 genes in cluster I, 696 unique Tair IDs were assigned and 51 transcriptions factors (TFs) in 18 families and 218 unique regulatory interactions were identified, forming the regulatory network shown in Figure 2.5. Ten hub transcriptional factors in center of the network - assigned with Tair IDs AT1G09540, AT1G32770, AT1G75240, AT1G78700, AT2G01940, AT3G12250, AT4G28500, AT4G29230, AT4G30080, AT5G12870 - ranked highest for regulatory connectivity with other genes within this cluster. Among the 10 hub genes, five genes are in the NAC (3) and MYB (2) families, associated with the secondary wall biosynthesis; four genes are involved in phytohormone signaling pathways: auxin (two ethylene and ARF TFs), abscisic acid (one ZF-HD TF) and brassinosteroid (one BES1 TF); and the remaining one gene belongs to the bZIP family, which regulates

18

plant systemic acquired resistance. The above functional annotation of the 10 TFs is in concordance with the results of GO and KEGG pathway enrichment analysis for the genes in this cluster, which highlights the defensive strategy for resistance in the parent genotype 94006 of several biotic resistance processes, including secondary cell wall strengthening, phytohormone signal transduction, and initiation of systemic acquired resistance.

Pair-wise comparison of parents’ Time-0 transcriptomes

In our study, we observed a remarkable discrepancy between the greenhouse experiment and the field trial, in that the resistant genotype 94006 displayed significantly less damage than susceptible ‘Jorr’ in the field, as expected. However, the greenhouse no-choice feeding trial showed that PLH also caused characteristic damage when forced to feed on the genotypes that were resistant in the field suggesting that host-choice by the PLH adults might be the primary basis for field resistance of S. purpurea. To test the hypothesis of host-choice as a resistance mechanism in genotype 94006, a comparison of the parents’ time 0 transcriptomes, prior to challenge by PLH, was conducted. At the no pest time point, the genes that were most highly expressed in the susceptible ‘Jorr’ genotype (S. viminalis) were significantly enriched in the GO terms ‘cellular process’ (GO:0009987, FDR:1.8e-27) and ‘metabolic process’ (GO:0008152, FDR: 3.4e-20). In contrast, the resistant species S. purpurea parent harbored genes with constitutively higher expression levels, relative to S. viminalis parent, that were assigned to the top GO term ‘response to stimulus’ (GO:0050896, FDR: 2e-22), which includes biotic stress response sub-categories such as ‘response to biotic stimulus’ (GO:0009607,FDR: 8.6e-08) and ‘response to chitin’ (GO:0010200, FDR: 2.3e-07), which were highly expressed. In addition to GO term enrichment analysis, we used MapMan (Thimm et al., 2004), another functional annotation software, to identify and visualize the processes and pathways distinctly enriched between two parent species (Figure 6). Fig. 2.6A shows the overall mapping of DEG in different functional groups as categorized within MapMan and Fig. 2.6B highlights the relevant transcriptional changes related to overall metabolism. Consistent with the GO analysis, the highly expressed genes in S. purpurea mapped primarily to functional categories of biotic and abiotic stress (Figure 2.6A). As expected, genes highly expressed in S. viminalis mapped to categories of protein synthesis and amino acid activation and cell division and cell cycle. Another interesting finding was a group of genes with higher expression in S. viminalis that were assigned to DNA repair and DNA synthesis processes, which may relate to faster vegetative growth in this parent. A closer look at the detailed metabolic pathways showed that genes expressed higher in S. viminalis were mapped to photosynthesis metabolism, especially the photorespiration process, which again may be related to

19

vigorous growth. The genes expressed more highly in S. purpurea, were largely involved in the biosynthesis of secondary metabolites such as terpenes, flavonoids, phenylpropanoids and phenolics. These metabolites have roles as toxins and feeding deterrents which play important defensive roles against many herbivorous insects (Taiz & Zeiger, 2010).

Dominance accounts for a large majority of the differential expression among F1 progeny

The mode of gene expression inheritance was assessed in F1 interspecific S. purpurea × S. viminalis progeny at the 4 different time-points. For each time-point, gene expression inheritance patterns of the F1 individuals were relatively uniform across the 19 haploid chromosomes of Salix (Figure S2.2). By comparing samples across time-points, there were significant differences in the number of differentially expressed genes and patterns of gene expression inheritance. Overall, the average number of P1 (S. purpurea) dominant genes of F1 family declined over the 96 hrs period, in a linear decreasing fashion, from 1552 at T0 to 988 at T96 (Table S2.1). However, the average number of P2 (S. viminalis) - dominant genes increased in wave motion, with more genes at T6 (1404) and T96 (1485), compared to T0 (1125) and T24 (1219) (Table S1) from 1125 at T0 to 1485 at T96. The number of genes with either over- or under-dominant inheritance increased from T0 to T96 but was less dramatic compared to those showing uniparental or dominant patterns of inheritance, whereas the converse was found for genes with additive expression inheritance on average (70 at T0 and 58 at T96). The variation among progeny was tuned after averaging, as we can observe more dramatic change over time for most of the individual clones. While the number of P1-dominant genes decreased in a linear fashion over time, there were more genes showing P2-dominant expression at T6 (1404) and T96 (1485), compared to T0 (1125) and T24 (1219). Likewise, the number of additively inherited genes increased from T0 (70) to T6 (89) but declined thereafter. In a previous study, using full-sib intraspecific F1 and F2 S. purpurea families, Carlson, et al. (2017) described dominant inheritance as the primary source of differential expression in both shoot tip and internode tissues. While there was less differential expression in the F2 family, the F1 S. purpurea individuals showed a high proportion of maternal P1-dominant gene expression, irrespective of the tissue type. Here, we report that dominance accounts for a large majority of the differential expression identified in the F1 S. purpurea x S. viminalis family. However, over a seemingly brief period of time (96 hrs), P1:P2 ratios of uniparental inheritance among the F1 individuals whereas not static, but tended to oscillate (Figure S2.4).

20

Discussion

Biosynthesis of secondary cell wall compounds as compensation for PLH injury

The ability to recover from the insect damage largely determines the resistance and tolerance of a plant under herbivory pest stress. The sooner the plant deploys an effective recovery, the less damage accumulates from pest feeding. Previous studies revealed that PLH employ a lacerate-and-flush feeding behavior (Kabrick and Backus, 1990; Backus et al., 2005), which consists of rupturing cells by rapid movement of their stylets, causing mechanical damage, and simultaneously salivating and withdrawing phloem liquid. Herbivory by PLH induces a cascade of anatomical and physiological disturbance in alfalfa, their primary host: injured phloem tissues first suffer cell wall loosening and collapse, followed by phloem blockage (Backus and Hunter, 1989; Nielson et al., 1990). The characteristic symptom “hopperburn” is largely the result of small blockages and phloem constrictions in the plant vasculature. The following phase is the regeneration of wound phloem sieve elements that circumvent the damaged phloem cells (Kabrick and Backus, 1990; Ecale 1993; Zhou and Buckus, 1999); as a result, xylem tissues are reduced in size and quantity, and the mature tracheary elements are compensated for by generation of numerous, thick-walled sclerenchyma fiber cells.

In our study, the resistant S. purpurea 94006 rapidly induced an arsenal of genes (within 6 h of PLH feeding) sharing the pattern of increased expression levels coinciding with the pest feeding that attained much higher levels than in susceptible genotypes in which defense genes were induced more gradually. This unique expression pattern suggests that these genes are involved in the effective defensive processes in resistant genotype. The functional annotations illustrated their roles in cell wall biosynthesis, especially those components abundant in secondary cell wall (xylan, glucuronoxylan and lignin). Combining the phenotypic observations and transcriptome results, it can be speculated that PLH first injures the vascular tissue, and that host recovery largely depends on the generation of new vascular transport cells to restore translocation function. In the resistant genotype 94006, the biosynthesis of plant cell walls (especially secondary cell walls) could both restore cell wall integrity and reinforce the secondary cell wall by increasing cell wall thickness, similar to examples reported in maize in which secondary cell wall thickness prevents mechanical rupture of the pith tissues by corn borer insects (Santiago et al, 2013) while re-differentiation of tracheids and sieve elements with thick lignified secondary cell walls restored the flow of nutrients and water to maintain the normal physiological activities.

21

Constitutive resistance or priming effects?

Plants develop a variety of defense strategies to combat herbivore infestation. These defense mechanisms can either occur as constitutively expressed resistance or as induced responses to insect attack (Herms and Mattson, 1992; Kessler and Baldwin, 2002; Mauch-Mani et al., 2017). Constitutive resistance provides direct and instant defense against insect attack, which continuously benefits plants in the presence of herbivores. However, the allocation of limited resources to the constitutive expression of resistance traits isn’t cost-effective in the absence of plant herbivores, and is a waste of energy needed for plant growth and other fitness aspects (Karban et al., 1997). To balance the various needs for the limited resources, plants vary greatly in their resistance to insects, both among species (Rasmann et al., 2009) and among genotypes within species (Han and Lincoln, 1994; Kempel et al., 2010).

In free-choice feeding trials, S. purpurea shows significantly less damage than S. viminalis; whereas in no-choice feeding trials, we observe characteristic damage on resistant genotype as well (Petzoldt et al., in prep). Pair-wise comparison of parent time 0 transcriptomes demonstrated that the genes that are highly expressed in susceptible S. viminalis are enriched for the functional GO term cellular metabolism/biosynthesis, indicating a vegetative growth status. Salix viminalis is one of the most widely grown willow species as a short rotation crop in Europe (Dimitrious et al., 2005; Berlin et al., 2014) and it is favored because of its multiple desirable traits for biomass production including high yield, fast growth, good coppicing, and good maintenance of growth form (Karp and Shield, 2008). Despite high sensitivities to pests and diseases, S. viminalis is still used in breeding programs for hybridization with other Salix species to access its vigorous growth traits (Serapiglia et al., 2014; Fabio et al., 2016). Our DEG analysis highlighted some of the functional categories of genes which may contribute to the vigor of S. viminalis. On the other hand, the resistant species S. purpurea showed higher expression of a group of genes that are enriched for the function of stress response, suggesting a constitutive defense status of S. purpurea under no pest condition. This discovery illustrates the well-known dilemma of plants to grow or to defend. The higher expression of resistance genes and pathways in S. purpurea implies that constitutive resistance may be the basis for host choice in potato leafhoppers. Our results identified potential key processes and genes that determine constitutive resistance of S. purpurea before PLH attack. However, in the no-choice feeding experiment the willow plants were grown in greenhouse chambers, which is not their optimal environment. Though the greenhouse conditions are chosen to provide allow comparisons under conditions that are normalized for all but the pest-injury treatment, the possibility of mild abiotic stress cannot be entirely excluded. If exposure to mild abiotic stress is not ruled out, it may serve as a priming process to initiate adaptive immunity in S. purpurea to better defend against the biotic stress during treatment. To further test the priming effect on willow species S. purpurea, it will be necessary to

22

quantitate and categorize the effects of various abiotic stresses, in parallel and sequentially with PLH treatment.

High correlations between NBS-LRR R genes and PLH-resistance and sex

Resistance genes (R genes) are ubiquitous in land plants, especially those encoding nucleotide binding site-leucine-rich repeat (NBS-LRR) genes (Yang, 2008), and have been shown to contribute to host resistance to pathogens in many genera (McHale et al., 2006). In general, NBS domains have highly conserved motifs (e.g., P-loop, kinase-2, and Gly-Leu-Pro-Leu motifs), which are involved in protein- protein interactions. Toll interleukin repeat (TIR)- and non-TIR-NBS-LRRs are highly abundant in the reference genomes of both S. purpurea and S. viminalis, as well as that of a close relative, Populus trichocarpa (Tuskan et al., 2006). Over 400 NBS-LRRs members of this large R gene family have been annotated in the S. purpurea v1.0 reference transcriptome assembly, a family size similar to that of P. trichocarpa (Kohler et al. 2008). In our study, genes significantly (P < 1e-6) correlated with the PLH severity phenotype in F1 plants were enriched for the NBS-LRR R gene family. Based on the pest symptom severity, the F1 progeny were categorized as susceptible or resistant (less susceptible) plants. The TIR clade of NBS-LRRs was highly-expressed in susceptible plants, but coiled-coil (CC) NBS-LRR clade was more abundant in resistant or less-susceptible individuals. Also, P2-dominant patterns of inheritance were over-represented for TIR-NBS-LRR genes, whereas P1-dominant patterns were over- represented for genes in the CC-NBS-LRRs clade. Divergent evolution of NBS-LRR R genes between S. purpurea and S. viminalis might account for these striking patterns of uniparental dominant gene expression and PLH resistance among their progeny. Co-expression analyses of F1 progeny clustered a large proportion of NBS-LRR gene family members into modules that were significantly correlated to resistance traits observed both in the greenhouse and in the field. The plum2 module showed the highest correlation to the log PLH severity (%) in F1 progeny (Figure 2.7A), compared to all other modules. Moreover, for a majority of genes within the plum2 module, there was a strong negative correlation with log PLH severity (Figure 2.7B). Among them are 8 genes identified possessing NBS-LRR motifs (Figure 7C). SapurV1A.0419s0010, a gene with the highest inverse correlation for log PLH severity (%), was highly-expressed in resistant and less-susceptible plants (Figure 2.7D), and annotated as an apoptotic CC- NBS-LRR gene, homologous to the Arabidopsis receptor kinase ZED1. In Arabidopsis, ZED1 is predicted to act as a decoy for the Pseudomonas syringae HopZ1 effector acetyltransferase (Lewis et al. 2013), which is required for recognition by the R gene, ZAR1, such that mutant plants lacking ZED1 showed increased pathogen growth.

23

Another NBS-LRR clade, called the BED finger NBS-LRRs, were identified as highly correlated with progeny sex (R2 = 0.75 - 0.99, P < 1e-60), and are primarily concentrated on S. purpurea chr15 and chr19 (Figure S2.3). The tandem duplication of NBS-LRR R genes is common in land plants and they tend to cluster over time. The three most significant BED finger NBS-LRRs located near the end of chr17 (SapurV1A.1003s0080, SapurV1A.1003s0090, and SapurV1A.1379s0010), share common architecture and are in relatively close proximity to one another (~10 Kb). The BED finger NBS-LRR, SapurV1A.1005s0080, with the greatest correlation with sex (R2 = 0.99, P = 1e-75), is located in a peritelomeric region of S. purpurea chr19, a region orthologous to the peritelomeric sex determining region (SDR) on P. trichocarpa chr19. Sex quantitative trait loci (QTL) have been mapped on Salix chr15 in full-sib F1 and F2 biparental families and association panels of both S. purpurea (Carlson et al., 2017; Zhou et al., 2017) and S. viminalis (Pucholt et al., 2015; Pucholt et al., 2017). Difference in gene content along the S. purpurea chr15 SDR or the putative ancestral Y chromosome (chr19) is likely to be more pronounced when comparing gene expression levels of plants of opposite sex undergoing biotic stress, especially for those genes within regions of low recombination. The accumulation and differential expression of R gene clusters along the peritelomeric end of S. purpurea chr19 or within the SDR of chr15 could be an artifact of sexual conflict.

Conclusion

Overall, this comparison of transcriptome and phenotypic effects of PLH herbivory indicates that both induced and constitutive differences in gene regulation appear to play roles in resistance of S. purpurea to PLH attack. The constitutively expressed stress-related genes may be the basis for host choice by PLH in the field. It is still not clear, however, if host-choice is sufficient for PLH resistance in S. purpurea, as forced feeding by PLH in the greenhouse caused some damage on 94006, which was accompanied by induction of a suite of biotic stress genes. This suggests that S. purpurea may be able to employ inducible resistance mechanisms when PLH is not provided a choice under field conditions.

24

Methods

Plant and pest materials

An F1 hybrid family, 407, was generated by crossing female S. purpurea 94006 with S. viminalis ‘Jorr’, a cultivar bred in Sweden. Progeny in the F1 family and their parents were planted in a field trial in Geneva, NY, USA, in 2014 in which each genotype was planted in a three-plant plot in each of four randomized complete blocks. The spacing was a single-row design with 1.83 m between rows and 0.46 m spacing between plants within a row. PLH damage was surveyed that growing season (see details below) and after coppice during the 2015 growing season. Plant height was measured after each growing season as well. Based on the field surveys results, 18 hybrid progeny genotypes displaying a wide range of PLH susceptibility were selected for the subsequent no-choice feeding trial in a greenhouse, together with parents 94006 and ‘Jorr’. Dormant cuttings (20 cm) were collected in March 2015 from one-year-old stems of plants growing in nursery beds and were planted in potting mix on June 9, 2015 with one cutting in each of four replicate 15 cm (~3 L) pots per genotype (80 pots in total). The four replicate pots of each genotype were grown inside a fine mesh cage (60 x 60 x 120 cm, Bugdorm 6620) in a greenhouse with a daytime (14 h supplemental lighting) temperature of 26-28°C and nighttime temperature of 20-22°C. A few cuttings that did not break bud were replaced with new cuttings within 7-10 d. Only three plants of the ‘Jorr’ parental genotype could be obtained in this manner for the experiment.

Potato leafhopper adults were collected from nursery beds of susceptible cultivars ‘Klara’ and ‘Stina’ using a gas-powered vacuum (modified leaf blower) fitted with a mesh bag. The insects were then sedated with CO2 and ca. 20 to 30 individuals were put into each of 20 vials with moist cotton and a fresh leaf and kept in a growth chamber overnight prior to starting the feeding trial.

Greenhouse no-choice feeding experiment

From 20 to 30 PLH adults in a single vial were released into each cage starting at 9 am on Day 1. Additional PLH were introduced the next day to substitute for any that had died. Leaf samples were collected according to a time course: time 0 (before treatment), time 6 h, time 24 h, and time 96 h after PLH introduction. This time course was chosen because the leaf curling symptoms were observed within only a few hours after PLH introduction. For tissue sampling, a single young, expanded leaf was collected near the top of a dominant shoot of each plant, folded and pushed into a 2 mL grinding tube, and quick

25

frozen in liquid nitrogen, then stored in a -80°C freezer. Plants were watered as needed through the mesh portals of the cages.

Phenotype measurements

In greenhouse experiment, observable plant characteristics or traits were measured and recorded at the beginning and termination points of PLH exposure, as well as after a recovery period. Damage severity (%) was visually scored to reflect tip necrosis, leaf curl and leaf yellowing for all plants at the end of the PLH exposure (4 d) and again 1 week after insect removal (11 d). Stem lengths of each plant were measured before and after the experiment, to calculate the percentage change of stem length, as an indicator of PLH impact on stem elongation. As many as 4 to 5 living PLH adults were found remaining in every cage at the end of the experiment. Nymphs were observed on plants of three genotypes 11X-407- 085, 11X-407-070, 11X-407-102, indicating that the PLH were able to oviposit on those genotypes during the course of the treatment. These three genotypes were among the least damaged by PLH in many previous pest surveys.

During 2014 and 2015, all genotypes (parents and progenies) were surveyed for various phenotypic traits in field trials. Three characteristic PLH infection symptoms — shoot tip damage, leaf necrosis and chlorosis, and leaf curl - were surveyed according to an established rubric three times (August 2014, September 2014 and September 2015). Stem heights were also measured at three time points (December 2014, June 2015 and August 2015). Mean phenotype measurements were calculated for each genotype and the means used as phenotypic traits for each time point in WGCNA.

RNA extraction and sequencing

Total RNA was extracted from each leaf sample using a Spectrum™ Plant Total RNA Kit (Sigma- Aldrich, St. Louis, MO) and checked for quality using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA 95051, USA). RNA-Seq bar-coded libraries were prepared on a robotic platform using the Illumina Trueseq Library Kit 2 for each of the 96 total RNA samples of highest quality (RIN > 6) and sequencing was conducted on an Illumina HiSeq 2500 sequencer at the Singapore Centre for Environmental Life Sciences Engineering at the Nanyang Technological University. Two 100 x 100 nucleotide paired end sequencing runs were conducted, yielding a total of 613 M reads for the 96 libraries. Data files in fastq format were provided for subsequent RNA-Seq data analysis. The data will be

26

deposited in the NCBI Sequence Reads Archive as a BioProject prior to publication.

Calculation and quantification of gene expression abundance

The raw sequencing reads were mapped to the S. purpurea reference genome predicted transcript sequences (Salix purpurea v1.0, DOE-JGI, Phytozome 12.0 https://phytozome.jgi.doe.gov/pz/portal.html#!info?alias=Org_Spurpurea) using Bowtie2 version 2.2.8, with parameters set on --local and –sensitive mode. The generated bam files were processed with Samtools ‘idx’ function to count the number of mapped reads per transcript. Results were stored in matrix format for further analyses.

Differential gene expression analysis

The R package DESeq2 version 1.10.1 (Love et al., 2014) was used to determine statistically significant differential expression using a model based on the negative binomial distribution. Principal component analyses (PCA) were also calculated with the DESeq2 package. For the false discovery rate controlling, we used the Benjamini and Hochberg’s approach (Benjamini and Hochberg, 1995). Thresholds of combining FDR < 0.001 and absolute value of log2 Ratio ≥ 1 were used to define significant differentially expressed genes (DEGs) in this study.

Weighted gene correlation network analysis (WGCNA)

The R package WGCNA version 1.36 (Langfelder and Horvath 2008) was used to identify modules of genes shared highly-correlated expression pattern. The low-expressed genes with read count <5 for 80% of all the libraries were filtered before subjected to WGCNA clustering. Raw counts were normalized using the varianceStabilizingTransformation function in DESeq2 (Love et al., 2014). A soft threshold value, power of 9, was used to transform the adjacency matrix to meet the scale-free topology criteria for optimal clustering. The outlier libraries were identified using an average linkage hierarchical cluster tree based on Euclidean distance. Modules of genes with correlated expression were obtained using a stringency threshold of 0.75. To understand the physiologic significance of each module, we correlated the 25 module eigengene expression profiles with physiological traits from both field and greenhouse

27

trials such as pest damage and growth rate, generating a full module-trait correlation table. For inheritance analysis, only genes with a sum normalized CPM > 1 for ≥ 50% of the samples were considered prior to network construction.

Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis

Salix gene IDs were first transferred to A. thaliana based on the gene annotation file from the Salix purpurea reference genome v1.0, DOE-JGI. DEGs (each with a Tair ID) were subjected to GO term singular enrichment against Arabidopsis background from AgriGo database v1.0 (Du et al., 2010). Fisher’s exact test was used for the enrichment analysis and the Bonferroni method was applied to evaluate the FDR with the significance level set to 0.05. DEGs were also subjected to KEGG enrichment analysis (http://www.genome.jp/kegg) to identify the statistical enriched KEGG pathways with FDR < 0.05. Software MapMan (http://mapMan.gabipd.org) was used to display log fold change of expressions between both parent genotypes on the cell function and metabolism overview maps.

Inheritance of Gene Expression

To determine the mode of inheritance for genes, the number of RNA-Seq reads mapped to individual genes was counted for each of the female (P1) and male (P2) parents and F1 progeny (H). Differentially expressed genes (FDR=0.01) were determined using an exact test implemented in edgeR for negative- binomially distributed counts. Only genes with a sum normalized counts-per-million (CPM) > 1 for at least 50% of the samples were considered in analyses. We used a custom R script to sort genes into the following six inheritance categories: (1) P1-dominant: H≈P1 and H≠P2, (2) P2-dominant: H≈P2 and H≠P1, (3) additive: P1P1 and H>P2, (5) underdominant: H

28

Tables

Table 2.1 Functional annotation of 10 hub transcription factors in the WGCNA darkgreen cluster.

Hub Transcription Factorα Transcription Factor Family Functional Annotation β

AT1G09540 MYB vasculature development xylem development

AT1G32770 NAC lignin biosynthetic process plant-type secondary cell wall biogenesis

AT4G28500 NAC secondary cell wall biogenesis

AT4G29230 NAC multicellular organism development secondary cell wall biogenesis

AT2G01940 C2H2 auxin biosynthesis and transport

AT4G30080 ARF auxin-activated signaling pathway cell division

AT1G75240 ZF-HD abscisic acid-activated signaling pathway

AT1G78700 BES1 brassinosteroid mediated signaling pathway

AT3G12250 bZIP systemic acquired resistance

AT5G12870 MYB defense response to fungus plant-type secondary cell wall biogenesis α: transferred Tair ID.

β: functional ontology term summarized on Arabidopsis gene function description on Tair database (https://www.arabidopsis.org).

29

Table 2.2 GO term enrichment of Differentially Expressed Genes (DEGs) between transcriptomes of parent genotypes S. purpurea 94006 and S. viminalis ‘Jorr’ at time 0.

Resistant S. purpurea Susceptible S. viminalis

Gene Functional Ontology FDR a Gene Functional Ontology FDR a response to stimulus 2e-22 cellular process 1.8e-27 response to chemical stimulus 1.1e-14 metabolic process 3.4e-20 response to organic substance 3.2e-11 primary metabolic process 8.4e-17 response to stress 1.2e-09 cellular metabolic process 8.4e-17 response to biotic stimulus 8.6e-08 biosynthetic process 8.6e-17 response to chitin 2.3e-07 cellular biosynthetic process 1.9e-16

α False discovery rate (α= 0.05)

30

Figures

Figure 2.1 Greenhouse phenotypic measurements of stem elongation rate and damage visual scoring. A) Mean change in total stem length (%) per plant over the 11 d of PLH treatment in the no-choice greenhouse feeding trial. Length of all stems per plant was measured with a meter stick at time 0 and time day 11 when the experiment ended. The percent change in total stem length per plant was calculated to reflect the impact of PLH on stem elongation over the 11 days. B) Damage severity (%) was scored for all plants on day 4 and day 11. The two measurements were combined together as averaged damage scores displayed on the above plot B.

31

Figure 2.2 PCA plot of all RNA-Seq libraries of the two parent genotypes at 4 time points. Genotypes are indicated using orange (‘Jorr’) and green (94006) colors. Different time points were represented as different shapes (circle: Time0; trangel: Time 6h; squre: Time 24h; cross: Time 96h). The horizontal PC1 dimension, which accounts for 69% of the variance among all samples, separates samples among the two genotypes. The vertical PC2 dimension, which accounts for15% variance of all samples, separates time 6 from other time points for both genotypes.

32

Figure 2.3 Venn diagrams of up- and down-regulated differentially expressed genes (DEGs) separately identified via pairwise comparison of PLH injured and uninjured libraries at all time points for each parent genotype. Panel A: S. purpurea 94006 up-regulated DEGs; Panel B: S. purpurea 94006 down-regulated DEGs; Panels C: S. viminalis ‘Jorr’ up-regulated DEGs; Panels D: S. viminalis ‘Jorr’ down-regulated DEGs. Color represent different pair-wise comparison against reference Time 0 (Red: Time 6h vs Time 0; Yellow: Time 24h vs Time 0; Green: Time 96h vs Time 0).

33

Figure 2.4 Co-expression analyses of all gene across all samples. A) Hierarchical cluster tree showing ten modules of co-expressed genes. Each gene is represented by a leaf in the tree. The WGCNA cluster dendrogram on all 96 samples grouped the genes into 25 distinct modules (Row - Merged Dynamic); B) WGCNA Module-trait relationships table. On the left side, each row represents a co- expressed gene cluster, and each column represents data for a phenotype: from left to right – greenhouse stem growth rate; greenhouse pest damage; field shoot damage; field necrosis; field leaf curling; field plant heights. The field damage data (shoot damage, leaf necrosis and leaf curl) were averaged for three different time points for this table. Three co-expressed gene clusters -- darkgreen, magenta, and black - show strong (positive or negative) relationships with six phenotypic traits measured from greenhouse and field trials (listed below the table), indicating that gene clusters in these modules are significantly related to willow resistance/growth traits. Module grey is the group of genes that do not cluster with any phenotypes. Light color cells within this row indicate low correlation with phenotypes. The right panel shows the heatmaps of gene expression and the eigengene expression profiles across each library of parent genotypes S. purpurea 94006 and S. viminalis ‘Jorr’ for C) Module darkgreen, D) Module magenta and E) Module black. The eigengene represents the standardized relative expression levels for each library. Genotype, sampling time point (hat) and biological replicate numbers are indicated above each column in the eigengenes profiles. Columns are vertically aligned for all three modules.

34

Figure 2.5 The internal regulations and connectivity network among genes in the darkgreen co- expressed gene cluster. The 10 hub transcriptional factors (AT1G09540, AT1G32770, AT1G75240, AT1G78700, AT2G01940, AT3G12250, AT4G28500, AT4G29230, AT4G30080, AT5G12870) are highlighted in yellow, which have the highest regulatory connectivity with other genes within this cluster. Green arrow: negative regulation; red arrow: positive regulation.

35

Figure 2.6 Overview distribution of log2 fold change of gene expression between parental species (S purpurea vs. S. viminalis) by MapMan. (A) Cell function overview map shows the gene expression difference in various functional groups between S. purpurea and S. viminalis. Color coded bars represent the ratio of gene expression in S. purpurea vs. S. viminalis (Red, higher levels in S. purpurea; blue, higher levels in S. viminalis). The intensity of the color change corresponds to log2 fold change. (B) On the metabolism overview map, each BIN or subBIN is represented as a block where each gene is displayed as a square which is either colored blue if this gene is highly expressed in S. purpurea or red if this gene is highly expressed in S. viminalis. Metabolites displayed as circles and proteins displayed as triangles.

*Only ratios with P values lower or equal to 0.05 are displayed.

36

A.

B.

37

Figure 2.7 Co-expression analyses of F1 S. purpurea × S. viminalis progeny and Panel A. Boxplots represent the significance of eigengene-trait relationship correlations between pest damage severity and 10 co-expressed gene modules with highest correlation to the percent severity of the pest damage trait. The plum2 module had the highest module-trait correlation. Panel B. The scaled eigengene expression heatmap for the plum2 module and scaled log percent severity of damage trait values are ordered according to their respective module membership (correlation between each gene and eigengene). The time of collection for each sample time-point 0, 6, 24, and 96 hr (top), are represented by white, light grey, dark grey, and black boxes, respectively. Panel C. The Manhattan plot depicts genome-wide significance for log damage severity (%). Gene models were aligned to the S. purpurea 94006 v2 reference assembly. The absolute R2 multiplied by the corresponding -log10 (p) (y-axis) is plotted against the physical position (Mb) of each gene (x-axis). The horizontal red line represents the genome-wise Bonferroni significance threshold, -log10 (p = 0.05 / n). Panel D. The scatterplot shows the regression of log percent severity and the normalized expression of the S. purpurea NBS-LRR gene, SapurV1A.0419s0010.1 (R2=0.5, p-value=7e-13).

38

Supplementary table and figures

Table S2.1 Inheritance patterns of global gene expression among all F1 S. purpurea × S. viminalis progeny individuals. Rows are ordered by time, starting with the control time point (T0), then numerically by clone identifiers. For each time-point, the total number of genes belonging to inheritance classes was summarized by the family average.

Clone Time P1- P2- Over Underdominant Additive Conserved dominant dominant dominant 11X-407-004 0 1781 1045 52 156 85 27332 11X-407-044 0 1001 1198 46 19 78 28109 11X-407-059 0 1316 1176 58 40 59 27802 11X-407-084 0 1366 1155 119 46 51 27714 11X-407-089 0 1719 1065 54 288 64 27261 11X-407-102 0 2274 1002 73 283 86 26733 11X-407-122 0 1404 1236 61 242 69 27439 11X-407-004 6 1395 1508 43 276 100 27129 11X-407-044 6 775 1916 81 22 98 27559 11X-407-059 6 1081 1503 59 21 77 27710 11X-407-084 6 1048 1526 66 69 89 27653 11X-407-089 6 1958 1119 59 210 79 27026 11X-407-102 6 2267 1065 67 247 98 26707 11X-407-122 6 1959 1191 53 228 82 26938 11X-407-004 24 1209 1188 51 76 56 27871 11X-407-044 24 904 1012 34 25 47 28429 11X-407-059 24 1558 1465 129 330 79 26890 11X-407-084 24 832 980 38 13 33 28555 11X-407-089 24 1589 1236 59 253 61 27253 11X-407-102 24 967 944 36 49 34 28421 11X-407-122 24 1851 1705 136 322 100 26337 11X-407-004 96 804 1186 25 39 41 28356 11X-407-044 96 1480 1437 193 105 78 27158 11X-407-059 96 750 1570 36 91 55 27949 11X-407-084 96 630 1248 54 20 53 28446 11X-407-089 96 1309 2029 140 586 82 26305 11X-407-102 96 846 1391 64 61 44 28045 11X-407-122 96 1098 1532 45 304 54 27418 Family Average by Time Time P1 dominant P2 dominant Overdominant Underdominant Additive Conserved T0 1552 1125 66 153 70 27484 T6 1498 1404 61 153 89 27246 T24 1273 1219 69 153 59 27680 T96 988 1485 80 172 58 27668

39

Figure S2. 1 Hierarchical tree graphs of over-represented GO (Gene Oncology) terms in biological process categories for co-expressed genes in the darkgreen, magenta and black modules by singular enrichment analysis generated by AgriGO. Each box represents a GO term in biological process category, labeled with the GO term ID, term definition. The significantly enriched GO term are identified by threshold of FDR ≤ 0.05 (FDR value shown in the brackets after term id), and filled with red-yellow colors, while non-significant terms are shown as white boxes. The degree of color saturation of a box is positively correlated to the significance level of the term. The color and type of lines represent different regulatory relationships (elaborated in legend window). The hierarchical rank of GO term decreases from top to bottom.

40

41

Figure S2. 2 Comparison of the plant hormone signal transduction initiations between two parent genotypes, as response to PLH attack. (Panel A: resistant parent S. purpurea; Panel B: susceptible parent S. viminalis). The induced genes are highlighted in red.

A.

42

Figure S2.2 B.

43

Figure S2. 3 Chromosome-wide patterns of inheritance of gene expression in F1 S. purpurea × S. viminalis progeny individuals. For each of the 19 chromosomes of Salix, the middle black bar separates T0 (left) from T6 (right). An equal number of bins (n=12), ordered by their physical position, represent the total number of genes in each inheritance class (scale, top left). Inheritance classifications are stacked within each bin and are colored according to the legend (center top).

44

45

Figure S2. 4 Manhattan plot of genome-wide distribution of significance of sex-biased gene 2 expression in F1 S. purpurea × S. viminalis progeny individuals. The absolute R multiplied by the corresponding -log10 (p) (y-axis) is plotted against the physical position (Mb) of each gene (x-axis). The horizontal red line represents the genome-wise Bonferroni significance threshold, -log10 (p = 0.05 / n). Gene models were aligned to the S. purpurea 94006 v2 reference assembly.

46

Figure S2. 5 Distribution of inheritance classes of genes among F1 S. purpurea × S. viminalis progeny individuals. Left column, Scatter plots of classes of gene expression inheritance patterns. Center column: Bar charts of same data for classes of gene expression inheritance patterns. Right column: Bar charts of inheritance patterns for P1>P2 vs. P2>P1. Replicates for genes were summed for each time point; Genes were filtered based on cpm (cpm>0.5 >=30%) and assigned inheritance classifications for only those with significant DE (FDR=0.05).

47

48

CHAPTER 3

Comparative Metagenomics Reveals the Effects of Geography and Host Genotype on Willow Rhizosphere Microbial Community2

Wanyan Wang1, John E. Carlson1, Eric S. Fabio2, Lawrence B. Smart2

1. Ecosystem Science and Management, Pennsylvania State University, University Park, PA, USA

2. Department of Horticulture, Cornell University, Geneva, NY USA

2 A modified version (more brief) of this chapter has been prepared for submission as: Wanyan Wang1, John E. Carlson1, Eric S. Fabio2, Lawrence B. Smart2 (2018) Comparative Metagenomics Reveals the Effects of Geography and Host Genotype on Willow Rhizophere Microbial Community. 49

Abstract

Plant roots are colonized by tens of thousands of microorganisms and these microbes and their metabolites have profound effects on host physiology and development. However, the factors which determine the rhizosphere microbial profile are poorly understood. This study compared the differences in microbial communities of shrub willow grown in plantations at various sites in Northeast US. In total, 130 rhizosphere soil samples representing three geographic locations (Rock Springs in Pennsylvania, Fredonia in New York and Mylan Park in West Virginia) and twelve willow genotypes were analyzed via metagenomics technology, spanning a 3-year period from planting time to harvest time. Geographic location and host genotype both resulted in differences in the structure and predicted function of willow rhizospheric microbiomes. Differences related to site location were greater than those observed for plant genotype. The bacterial genera Mycobacterium, Methylobacterium, Frankia, Rhodopseudomonas and Nitrobacter, which include well-studied plant growth-promoting microorganisms, showed significant correlations with willow biomass yields in the states of NY and PA, indicating that they may have vital roles in willow growth and yield.

Introduction

Soil microbes are the dominant drivers for biogeochemical activities and cycles in terrestrial ecosystems (Fierer et al., 2012). A healthy plant is colonized and associated by numerous microorganisms, which together constitute a holobiont, whose holobiome, is a combination of plant host genome and metagenomes of the microbial community (Terrazas et al. 2016). The rhizosphere is the narrow zone of soil surrounding and influenced by plant roots chemically, biologically and physically (Pinton et al. 2007), a hot spot where direct interaction between soil microbiome and host plant occur. The three important constituents of the rhizosphere are the plant root, soil and microbiome. A growing body of research is highlighting the significance of interplays between plant host and microbes. Relationships between individual microbes and their plant hosts range from mutualistic to communalistic and pathogenic relationships (Schlaeppi and Bulgarelli 2015). Plants can benefit from the interaction with microbes which promote plant growth, including enhanced nutrient uptake, hormone production and disease suppression (Mishra et al. 2012; Bulgarelli et al. 2013). For example, mycorrhizal and legume plants benefit from mutualistic mycorrhizal fungi and rhizobia on P and N supply, respectively (Bouwmeester et al. 2007; Oldroyd et al. 2011), and these have practical applications in agriculture. Those microbes with proven plant growth promoting traits are defined as plant growth-promoting

50

rhizobacteria (PGPR) and plant growth-promoting fungi (PGPF) (Kloepper and Schroth, 1978; Meera et al., 1994). To achieve high crop yield, traditional agricultural management including fertilization and pest/disease control mainly use chemical products, which, however, could be hazardous to the environment (Tilman et al. 2002). Thus understanding the interplays between plants and their associated microbiomes, and developing ways to manipulate plant microbiota with beneficial traits is a promising approach for increasing crop yield and maintaining sustainable agriculture (Schlaeppi and Bulgarelli 2015). However, the means by which root-associated microbiota impacts on plant growth and fitness and, in turn, how plant hosts shape and maintain the microbial community remains obscure. Exploring the extended plant genome – the plant microbiome- will shed light on the potential of plant yield and fitness to external environments. Ultimately, unraveling the net result of microbial activities encoded in the extended plant genome - the plant microbiome- will be the key to understanding and exploiting the full yield potential of a crop plant. The advent of high-throughput sequencing technology enables a deep and comprehensive profiling of plant microbiota structure, function and networking. Metagenomics is a holistic approach to simultaneous profiling the genome mixtures of microbial communities (Riesenfeld et al. 2004). In our study, we applied DNA shotgun sequencing for profiling the taxonomic and functional structures of the microbial communities before and after willow plantation. Our objective was to explore the extents to which host plant and geographic factors respectively explain the assembly of rhizospheric microbial community. For this purpose, we compared the taxonomic and functional profiles of rhizosphere soil samples from 3 sites in the northeastern United States before and after willow cultivation, respectively.

Factors determining rhizosphere community composition

Research has demonstrated that both soil environment and plant host influence the structures of rhizosphere microbiota (Badri et al. 2013; Bulgarelli et al. 2012, 2015). However, the quantitative contributions of edaphic factors or plant traits in shaping root microbiome composition vary according to the individual soil environment. Plants can deposit nearly 11% of net photo-assimilated carbon and 10-16% of total plant nitrogen into the rhizosphere (Jones et al. 2009), which promotes increased biomass and activity of microorganisms, in comparison with bulk soil (Sorensen 1997; Raaijmakers et al. 2009). This process, referred to as rhizodeposition, releases various carbohydrates, amino acids, organic acids etc., as well as root border cells, providing a reservoir of nutrients for rhizospheric microbes (Bais et al. 2006). Since most soil microbial species are organotrophs, meaning that they require energy from organic compounds for life activities (Dar 2009), the rhizodeposition stimulates specific microorganisms to

51

prosper in the rhizosphere and rhizoplane, driving the formation of distinct microbiomes from the surrounding soil biome. Besides providing an energy source, plants can influence microbial structure by altering physical microhabitats and changing soil conditions (Hooper et al., 2000; Wardle, 2006; Millard & Singh, 2010; Eisenhauer et al., 2011).

On the other hand, a variety of soil properties can also influence microbial communities to a great extent, including soil texture (Girvan et al. 2003), soil nitrogen availability (Frey et al. 2004), soil phosphate availability (Faoro et al. 2010) and soil pH (Fierer and Jackson 2006; Lauber et al. 2008; Rousk et al. 2010). In Fierer and Jackson’s study, they suggested out of all these edaphic factors, soil pH has the dominant influence on soil microbial community structure.

Besides soil properties and vegetation type, alterations of other environmental factors could also shift the microbial community composition, like seasonal and land management-related factors, which could change the physical and chemical properties of soil (Schutter et al. 2001). The relative impacts of these different factors on the development of soil microbial communities and whether they act independently or synergistically remains largely unknown.

Plant Growth Promoting Rhizobacteria (PGPR)

PGPR (plant growth promoting rhizobacteria) is a broad category characterized by three features: 1. able to colonize on or around plant root surfaces; 2. possess plant growth-promoting trait; 3. survive, multiply and express their promoting traits for long enough periods (Kloepper, 1994). Based on their mechanisms of promoting host plant growth, the PGPR can be classified into several categories: biofertilizer (facilitate the nutrient uptake); phyto-stimulator (modulate plant hormone levels), rhizoremediator (alleviate soil pollution); biopesticide (bio-control of diseases and pests) (Somers et al. 2004; Antoun and Pre´vost, 2005). In addition, according to their extents of proximity with plant roots, PGPR are also grouped as extracellular (ePGPR), which inhabit in the rhizosphere soil system, and intracellular (iPGPR), which enter root cells (Gray and Smith 2005), with some able to induce formation of N2-fixing nodules (Figueiredo et al.,2011). Some well-known ePGPR include Agrobacterium, Arthrobacter, Azotobacter, Azospirillum, Bacillus, Burkholderia, Caulobacter, Chromobacterium, Erwinia, Flavobacterium, Micrococcous, Pseudomonas and Serratia etc. (Bhattacharyya and Jha, 2012); and iPGPR include Allorhizobium, Azorhizobium, Bradyrhizobium, Mesorhizobium and Rhizobium (Ahemad and Kibret 2014).

Examples of mechanisms of plant growth promotion:

52

1. Nitrogen fixation. Nitrogen plays a vital role in plant physiology. However, the available soil nitrogen concentration is mostly limited, restraining the full potential of plant growth and productivity. Nitrogenase is a kind of enzyme produced by certain bacteria, which converts atmospheric nitrogen into a plant-assimilable format of ammonia (Kim and Rees, 1994). The nitrogen fixation by microorganisms contributes to about half of all nitrogen fixed globally, providing a natural and economical alternative to manufactured nitrogen fertilization. There are two kinds of nitrogen-fixing bacteria: symbiotic and non-symbiotic. The symbiotic nitrogen- fixing bacteria include microbes in the Rhizobiaceae family which can establish symbiotic relationships mostly with legume hosts (e.g. Rhizobium, Bradyrhizobium), to form nodule

structures in host root systems. Another group of N2-fixing symbionts is in the family Frankiaceae, which induce nodule formation in several non-legume trees (e.g. Frankia). The non- symbiotic nitrogen-fixing bacteria do not require host plants; they can be free-living, associative in the rhizosphere, or colonize roots as endophytes, providing only a small amount of fixed nitrogen compared to the symbiont group (Glick, 2012). The already-known non-symbiotic nitrogen fixers include , Azospirillum, Azotobacter, Gluconoacetobacter diazotrophicus and Azocarus (Bhattacharyya and Jha, 2012). 2. Phosphate solubilization. Phosphate is another essential nutrient for plants. Even though there is a high concentration of phosphate in the soil, the phosphorous compounds that can be assimilated - 2- by plants are only in two soluble forms: the monobasic (H2PO4 ) and the dibasic (HPO4 ) ions (Bhattacharyya and Jha, 2012), which are often deficient in soil. Phosphate fertilizer is frequently applied to crops; however, the efficiency is not optimal since only a small proportion can be absorbed by plants, with most forming insoluble complexes in the soil (Mckenzie and Roberts, 1990). Microorganisms with the ability to solubilize phosphate could facilitate phosphate uptake by plants. Phosphate solubilizing microorganisms (PSM) include genera like Azotobacter, Bacillus, Beijerinckia, Burkholderia, Enterobacter, Erwinia, Flavobacterium, Microbacterium, Pseudomonas, Rhizobium and Serratia, which have been examined in previous studies (Bhattacharyya and Jha, 2012). In a plethora of cases, inoculations of phosphate-solubilizing bacteria not only provides organic P resource, but also promote plant growth via inducing other mechanisms such as biological nitrogen fixation and increased uptake of other trace elements (Suman et al., 2001; Ahmad et al., 2008; Zaidi et al., 2009). 3. Siderophore production. Iron is another essential mineral substance for both microbial and plant physiological activities. In oxic environments, iron is more likely to exist as Fe3+ in the insoluble formats of hydroxides and oxyhydroxides, which limits plant assimilation of iron (Rajkumar et al., 2010). There are bacteria with the ability to secrete chelators that bind iron with high association

53

constants, and these chelators are referred to as siderophores. The siderophore-bound iron complexes can be reduced to Fe2+ on bacterial cell membranes taken up by bacteria cells, with the siderophore being released back to soil for iron binding again (Rajkumar et al., 2010; Neilands, 1995). Plants can use bacterial siderophores via siderophore-Fe uptake or a ligand exchange reaction (Schmidt, 1999). Among all siderophore-producing bacteria, Pseudomonas is the best known for providing iron nutrition to promote plant growth (Vansuyt et al., 2007; Sharma et al. 2003). 4. Auxin (IAA) production. Auxin is a class of phytohormones that plays a crucial role in plant growth and defense, such as promoting cell division and extension and seed germination, inducing vascular tissue development, controlling fruit maturation, as well as transmitting signals of defense response. Pattern and Click (1996) reported that almost 80% of the isolated rhizobacteria were characterized by the auxin-synthesized ability. Bacterial auxin can facilitate the growth and development of root system, increasing the space and potential for more interactions between rhizobacteria and the host plant. Most Rhizobium species have shown IAA production (Ahemad and Khan, 2012b, d, f; Ahemad and Khan, 2011e, j), which is necessary for nodule formation in legume plants (Glick, 2012; Spaepen et al., 2007). 5. Biological Controls. Using microorganisms to control soil-borne pathogens is an environmentally sound approach for disease control in agriculture (Lugtenberg and Kamilova, 2009). This primary target could be achieved via multiple antagonistic activities by PGPR against pathogens, including induced systemic resistance (ISR), antibiosis, and competition for nutrients and space. The PGPRs that are applied as biocontrol agents (BCA) on crops include Bacillus (Bt) and Pseudomonas, while PGPFs include Trichoderma spp. etc. (Babbal et al., 2017).

Results

Data characteristics and quality assessment

A total of 130 rhizosphere soil samples represented three geographic locations (Rock Springs in Pennsylvania, Fredonia in New York and Mylan Park in West Virginia), twelve willow genotypes, and two sampling times (Table S3.1). Microbial community DNA was subjected to shotgun metagenomics sequencing to generate approximately a total of 1,193,050,340 reads from paired-end sequencing with the length of 150 base pairs. The percentages of reads per sample that passed quality control ranged from 75.91% to 97.02% (except for an outlier of 46.59%, which was excluded from the analysis) with a mean

54

of 94.72%. Of the sequences that passed QC, an average of 0.07% (ranging from 0.05% to 0.12%) of the sequences were annotated as ribosomal RNA genes; an average of 32.95% (ranging from 21.09% to 39.64%) of the sequences encoded predicted proteins annotated with known function categories, a percentage in the range typically observed in shotgun metagenomic sequence classification (Prakash et al., 2012). The annotation analyses identified 11089 functionally classified features and 4779 unique taxa from bacterial, eukaryotic, viral and archeal origin, ranking from domain down to species. Overall, 90% or more of the sequences belonged to bacteria. Rarefaction curves were plotted for willow rhizosphere samples and pre-planting soil samples separately (Figure S3.3).

Willow cultivation, geographic location and willow genotype all affect soil microbial community structure

To visualize sample-to-sample distances, all the samples were projected onto a principal coordinate plot (Figure 3.2a) using Bray–Curtis dissimilarities on species level. On this plot, the PCoA1 and PCoA3 axes explained 31.30% of the variance among all samples, primarily separated pre-planting soil microbiomes from the microbiomes after willow growth. Also, we observed that willow rhizosphere microbiomes were grouped exclusively based on their geographic locations on PCoA 2 scale, which explained 9.71% of the variance; however, the pre-planting samples do not show the same pattern of geographic effect. Precedent studies have observed plant host genotype impacting the structure of rhizosphere microbial communities (Kuske et al., 2002; Brusetti et al., 2005; Aleklett et al., 2015; Bulgarelli et al., 2015). On PCoA plot (Figure 3.2a), it was not easy to tell the host genotype effect on shaping the community composition. Meanwhile, a PERMANOVA test was conducted on Bray-Curtis distance matrices among all microbiomes, evaluating the effects of both host genotype and geographic location. The result demonstrated that both geographic location and host genotype had significant influence on structuring microbial community, that geographic location had a greater effect with p value of 0.001, and host genotype had less effect with p-value of 0.031 (Table 3.1).

Microbial diversity in the soil microbiome is influenced by willow planting and geographic location

To characterize community diversity of each microbiome and investigate how bacterial α-diversity was affected by willow planting and geographic location, Shannon’s diversity index was calculated for microbial community within each sample. Box plots (Figure 3.3) separating time points, as well as different geo-locations, showed that α-diversity increased after willow planting for all 3 sites. Tukey tests

55

validated that willow planting had a significant influence on community diversity. In addition, the 2 well- established sties – Fredonia in NY and Rock Springs in PA - had quite similar levels of species diversity, with smaller range of variations when compared to Mylan Park site in WV, where willows fared poorly and suffered a low survival rate. The Mylan Park samples had significantly lower microbial diversities and a larger range of variation. Similar alteration on microbial diversity were also observed in Rodrigueset et al’s study, which examined soil microbial diversity after conversion of Amazon rainforest to agricultural land, where they observed increased taxonomic and phylogenetic diversity of soil bacteria after conversion locally, and more uniformity across the different terrestrial spaces (Rodrigueset et al., 2013).

Relative abundances of the dominant taxa vary on geographic scale

Across all soil samples in this study, in total we identified 63 microbial phyla, 5 of which (, , , , ) were considered as dominant phyla, comprising on average more than 70% of all phyla abundance within each sample (Figure 3.4). The relative abundance of microbial phyla profiles seemed to differ between groups of samples before- and after- willow-planting. Generally, two dominant phyla Firmicutes and Actinobacteria decreased their relative abundance after years of willow cultivation; whereas the top dominant phylum Proteobacteria increased. Because Actinobacteria can sporulate to enter quiescent stage in response to stress, the high abundance of Actinobacteria within these microbiomes may indicate stress conditions (Naylor et al., 2017). Bacteroidetes and proteobacteria, which have been suggested as fast-growing copiotrophic microbes, are usually more abundant in rhizosphere soil samples due to the rich carbon resources in rhizosphere region. The relative abundance of , described as oligotrophy in previous research (Fierer et al., 2007) would be expected to be lower in rhizosphere soil than in C-poor bulk soil. However, the opposite was observed in our samples, in which Acidobacteria increased in relative abundance within the soil microbial community between pre-planting and at harvest time, especially at the Rock Spring site. As stated by Fierer, the copiotrophy-oligotrophy spectrum may not cover the all taxa within each characterized phylum.

Within the willow rhizospheric samples, the microbial communities from the same geographic location shared very similar taxonomic profiles, on all 3 sites. Some phyla, showing geographic specificities, are specifically associated with one site or comprise relatively high abundance on one site rather than other sites. For example, fungal phylum Basidiomycota show specifically high relative abundance (4% on average, compared to 0.3% on other 2 sites) in Fredonia; the prokaryotic phylum Acidobacteria presented

56

a particularly high proportion (5% on average, compared to 2% on other 2 sites) in Rock Springs; and viral family Microviridae prevailed in Mylan Park (account for 10%). By contrast, among the pre-planting samples, the transition in relative abundance profiles for microbial communities within each site was more discrete than gradual, meaning less similarity in taxonomic structures was observed among microbiome within the same site.

‘Core Microbiome’ in willow rhizosphere

Identifying a core microbiome is an approach to understand the specific and stable status of host plant and microbiome holobionts across complex microbial assemblages. A core microbiome is usually defined as the microbes that are present across multiple samples from similar environmental condition. In our study, among all willow rhizosphere microbiomes across different sites, we detected a core microbiome comprised of 156 genera that were commonly present in at least 80% of microbial communities with more than 0.1% relative abundance (Figure 3.4). Of these 156 genera, which represent 13.2% of all OTUs and 70.1% of total read count abundance, 97 genera were also identified as members of the taxa commonly present across sites before willow planting, suggesting that the microbes identified as core genera in willow rhizosphere microbiome are not necessarily specific to rhizospheric region of willow but also be represented commonly within soil environments. The rest of the 59 genera not commonly present in pre-planting soil microbial community were enriched in willow rhizosphere region, indicating willow cultivation recruit host- specific microbes during establishment (Table 3.2). Among the 59 genera, 37 of them came from phylum Proteobacteria, and the rest were from other 9 phyla (Figure 3.5).

‘Functional Core’ of rhizosphere microbiome observed in the successfully-established willow trials

Changes in the microbiota can be also approached from a functional perspective. Most of studies on microbial community focused on the taxonomic composition, but little attention has been paid to investigate the microbiota function, which is more relevant to elucidate the effect of microbiota on host function. The shotgun sequence data provides not only the taxonomic profiling but also predicts the functional attributes of a community. Based on the functional annotation of all samples, we also generated a PCoA plot on functional characterization (Figure 3.2b). Compared with the fore-mentioned PCoA plot on taxonomic composition, PCoA plot of functional profiles of all samples captures the same clustering pattern of separation between before-planting and post-cultivation communities. However, among post-

57

cultivation samples, instead of being grouped by their geolocations, the functional PCoA plot showed that the communities from the 2 well-established willow trials- Rock Springs and Fredonia - grouped closely together, whereas Mylan Park, a poorly-performing willow trial, formed a distinct cluster. This finding suggests a “functional core” in the mature willow rhizosphere microbiota, where taxonomy may vary but functional structure is stable to maintain the relationship with a host plant. This might originate from the functional redundancy of different microbes. Precedent of this functional redundancy was found in human gut microbiota (Lozupone et al., 2012; Moya and Ferrer, 2016), as well as in algae-associated seawater microbiome (Burke et al., 2011), which introduced an idea that, to some extent, species are interchangeable in a given microbiota in terms of function.

Promising plant-growth-promoting-microbes associated with willow biomass yield

We also calculated the association between the microbes and specific environmental variables, like the geo-location, the willow genotype and the willow yield, quantified as the Pearson's correlation between quantitative/binary explanatory variables and the relative abundance of microbial genera, shown in a heatmap (Figure 3.7). Firstly, most willow genotypes do not associate with some specific microbes, expect for cultivar ‘Otisco’, which, unlike other genotypes, hold a suite of microbes strongly associated with it. A previous study on testing adaptations of different willow cultivars to various environments found out that the ‘Otisco’ cultivar had the most stable yields across different environmental conditions (Fabio et al 2017). Under this context, we speculated that the recruitment by cultivar ‘Otisco’ of its particular microbes contributes to maintenance of stable biomass yield. To test this hypothesis, experiments with other variables under more strictly controlled conditions and with more genotypes included need to be done.

The association heatmap also revealed a set of microbes displaying high correlations with willow biomass yield. Among the 50 most abundant genera, 13 show significantly high correlation with biomass yield, among which were 11 genera which include species reported to have plant growth promoting trait in previous studies (Table 3.2) - Mycobacterium (Tsavkelova et al. 2005), Methylobacterium (Akio et al., 2015), Rhodococcus (Belimov et al. 2005), Frankia (Richardson et al. 2009), Rhodopseudomonas (Wong et al. 2014), Streptomyces (Tokala et al. 2002), Nitrobacter (Wrage et al. 2001), Bradyrhizobium (Ahemad and Khan. 2012a, b, c, d), Caulobacter (Ahemad and Khan, 2014), Rhodospirillum (Saxena and Tilak, 1998), Sinorhizobium (Pandey et al. 2007). These findings suggest that representatives of these genera are promising candidates as plant growth-promoting rhizobacteria (PGPRs). Some of these RGPRs produce plant hormones like IAA, which are essential for root elongation and development (Lata et al.,

58

2006; Xin et al., 2009a, b; Merzaeva and Shirokikh, 2010; Apine and Jadhav, 2011). This trait could facilitate root growth of willow in rocky and sandy riparian habitats (Doty et al., 2005, 2009). Also there are rhizobacteria with the ability of nitrification and nitrogen fixation abilities, hence promoting willow growth infertile soils (Xin et al., 2009a, b; Knoth et al., 2014). Other characteristics like induction of systemic resistance and disease suppression could promote host plant health and resistance to biotic stresses. These 11 genera could be promising willow growth promoting microbes, which need to be verified with in planta test for future application in willow breeding and cultivation.

Discussion

The impact of willow planting on dynamics of copiotrophic and oligotrophic phyla

Cataloging complex bacteria communities is an ever-present challenge for microbiologists in ecological contexts. Great efforts have been made to determine which groups of bacteria are most abundant in different soils and why. Nevertheless, most bacteria are still largely uncharacterized. The idea of classifying microorganisms as copiotrophs or oligotrophs implies that bacteria which grow faster in a carbon-rich environment are copiotrophs, whereas bacteria which display more tolerance in low resource concentration environments are oligotrophs (Weber, 1907; Koch, 2001). However, since each bacterial phylum comprises diverse taxa on different phylogenic levels, the simple classification of copiotroph and oligotroph cannot be applied to every taxon. In Fierer’s study (2007), the net carbon mineralization rate, which is considered as an index of C availability, was verified as a strong predictor of the abundances of Acidobacteria, β-Proteobacteria, and Bacteroidetes in soils, but failed to predict the abundances of α- Proteobacteria, Firmicutes, and Actinobacteria. In our study, after several years of willow growth on each field trial, we collected willow rhizosphere soil samples and compared them separately with the bulk soil samples before willow planting from the same site. The rhizosphere soil is often considered as organic carbon-rich due to plant rhizodeposition.

Generally, the comparison demonstrated that the two dominant phyla Firmicutes and Actinobacteria decreased their relative abundances in willow rhizosphere; in contrast, the top dominant phylum Proteobacteria increased. In Fierer’s study (2007), the abundances of Firmicutes and could not be predicted by the availability of organic carbon in the soil environment. Actinobacteria are characterized featured by their ability to sporulate to become dormant as a strategy to tolerate stress conditions (Naylor et al., 2017). The decrease of Actinobacteria may reflect the mitigation of stress conditions in the soil

59

after willow plantation. Proteobacteria and Bacteroidetes were described as copiotrophs, which would elevate their relative abundances in the soil with high carbon resource availability. Consistent with described copiotrophic attributes, in our study, these two phyla did exhibit higher relative abundances in rhizosphere soil, indicating richer carbon resources after willow planting, especially within the rhizosphere region. However, the representation of Acidobacteria did not follow copiotroph theory in our study. According to Fierer’s categorization of Acidobacteria as oligotrophs, the relative abundance of this taxon is expected to be lower in willow rhizosphere soil compared to relatively C-poor pre-planting bulk soils (Fierer et al., 2007). In our study, to the contrary, Acidobacteria increased its proportion in the microbial community, especially at the Rock Spring (Pennsylvania) site. However, as also pointed out by Fierer, due to the high phylogenetic diversity within each phylum, it’s impossible for an entire to be associated with the same ecological attributes; thereby the same attributes for Acidobacteria, Proteobacteria and Bacteroidetes can’t be applied to every taxon within the phylum. For example, the autotrophic ammonia oxidizers which belong to β-Proteobacteria, a copiotrophic phylum, are chemolithoautotrophs that would not be easily classified (Bedard and Knowles 1989). Moreover, because of the limitations associated with shotgun metagenomics sequencing, the abundance of each bacteria phylum is calculated as a ratio of relative abundance to the total microbial community, instead of the absolute abundance. Therefore, it cannot be determined whether a change in relative abundance is due to a change in absolute abundance or changes in other phyla within the community if the total population is constant.

PGPRs and their application in willow breeding and growing

Plant growth promoting rhizobacteria inhabit rhizospheres (Kloepper et al., 1991) and promote plant growth, development and health via direct or indirect interactions. The direct positive effects of PGPRs include facilitating soil nutrient recycling (Glick, 1995; Glick, 2012) and regulating plant hormones (Frankenberger and Arshad 1995). Indirect effects include induction of systemic resistance (Kloepper et al., 1992; Van Loon et al., 1998) and inhibition of soil-borne pathogens (Anjaiah et al., 1998; Rodriguez and Pfender, 1997; Thomashow et al., 1997). Technologies involving inoculation of PGPR have resulted in commercialized products in modern agriculture (Berg, 2009). Some of the PGPM model organisms, which have been well-studied for their plant growth-promoting mechanisms, also exhibited high correlations with willow biomass yield in our study, showing promising potential for application on willow breeding and growing in the future. Here we identified 11 promising genera that may include PGPRs for willow rhizospheres. Members of the Mycobacterium genus have been characterized as highly

60

active IAA producers. Previous studies of D. moschatum seed inoculation with Mycobacterium sp. showed a significant promotion of seed germination, even in absence of mycorrhizal fungi, which are usually required for orchid germination (Tsavkelova et al. 2005, 2007). Methylobacterium is a versatile genus with multifunctions to augment plant growth and seed germination, including phytohormone modulation (producing ACC deaminase, IAA, cytokinin), phosphate solubilization, nitrogen fixation, as well as antagonism against pathogens (Akio et al., 2015). Rhodococcus and Streptomyces are two genera under the phylum Actinobacteria, and they are reported to have numerous plant growth beneficial traits (Bhattacharyya and Jha, 2012; Merzaeva and Shirokikh 2006). Species in both these two genera have been used as biocontrol agents on root fungal pathogens (Bhattacharyya and Jha, 2012; Franco-Correa et al., 2010). Frankia is another genus of Actinobacteria, which is recognized for its prominent ability of nitrogen fixation, contributing as much as 15% of the total biologically fixed nitrogen global-wise. Rhodopseudomonas spp. are also capable of nitrogen fixation (Cantera et al., 2004), and have also been characterized as effective candidates for bioremediation based on flexibility in utilizing multiple compounds for metabolism. Nitrobacter spp. have important roles in the nitrogen cycle, implementing a process termed nitrification, which oxidizes the nitrite (toxic to plants) to non-toxic nitrates that can be assimilated by plants. Bradyrhizobium spp. are additional PGPRs capable of nitrogen fixation, phosphate solubilization and siderophore production (Ahemad and Khan. 2012a, b, c, d). Members of the Bradyrhizobium genus have also shown synergistic effects when co-inoculated with mycorrhizal fungi (Xie et al. 1995). Members of the Rhodospirillum genus produce cytokines and gibberellins for plant growth promotion (Serdyuk et al., 2000; Glick 2012). Sinorhizobium spp. are known by their abilities to fix nitrogen (Delić-Vukmir et al. 1994, 2013) and synergistically promote plant growth when co- inoculated with other PGPRs (Singh et al., 2014). These candidate PGPRs could be worthy of further in- planta inoculation experiments either with individual strains or in combination with other PGPRs verified in previous studies.

Conclusion

In our study, both geographical location and host genotype had significant effects on the structure and potential function of rhizospheric microbiomes in willow plantations. The mature willow rhizosphere microbiome appeared to have developed a stable functional core shared among sites, despite variation in the specific taxonomic composition. Most importantly, there were 11 genera showing highest correlation with willow biomass yield that have been also verified in previous studies as plant growth promoting rhizobacteria, which is consistent with their vital roles in willow growth and yield.

61

Methods and Materials

Field sites and plant material

In this study, three different shrub willow plantations were established at locations in 3 states in the northeastern United State: a site at Rock Springs in Pennsylvania (coordinates: 40.699967, -77.962175; soil type: Buchanan channery loam, 8 to 15 percent slopes), a site at Fredonia: in New York (coordinates: 42.443133, -79.291738; soil type: VoB-Volusia channery silt loam, 3 to 8 percent slopes) and a site at Mylan Park in West Virginia (coordinates: 39.635661, -80.039110; soil type: Bethesda loam, 8 to 25 percent slopes, reclaimed) (Web Soil Survey 3.3, 2018). The Rock Springs shrub willow trial site, which was classified as marginal land due to its low productivity for annual food crops, was planted in July 2013 without fertilization and harvested in January 2016. The Fredonia willow trial, which was also classified as marginal land, were fertilized with lime before planting and planted in June 2013 and harvested in December 2015. The Mylan Park willow trial was a reclaimed mine site with little to no soil. It was planted in 2014 and has not yet harvested. No additional fertilizers were applied post planting at all three sites.

The plantation trial plans for the three sites are shown in figure S1. All three sites were divided into 4 replicates in a split-plot design. On both Rock Springs and Fredonia (Figure S1A), within each block, there were 24 rectangular plots (3 rows * 8 columns). For each site, 24 different shrub willow cultivars were random assigned to each plot and repeated for each of the 4 blocks. Every plot was planted with 40 cuttings of willows from the assigned cultivar (10 plants/row * 4 rows) in a double-row spacing planting arrangement. Among the 24 cultivars, we chose 12 cultivars shared in common by both to represent a range of biomass yield on both sites, although the yield ranking at the end of the experiment shown great variation between sites. On Mylan Park, there were 10 plots within each block and 10 different shrub willow cultivars were randomly assigned to each plot, and repeated in each block. Every plot was planted with 48 willows of the assigned cultivar (12 plants/row * 4 rows) in a single-row spacing plantation practice. For Mylan Park, only two of the willow cultivars also planted in both Rock Spring and Fredonia sites were selected for soil collection.

62

Soil Sampling and collection

Pre-planting soil sampling: Sampling points were chosen randomly from each willow site before bulk soil samples at the depth of 15-20 cm were collected by soil corer. Roots were manually removed and the soil was immediately put into sterile bags and thoroughly mixed by shaking. Soil samples were transferred to lab on dry ice and stored at -80°C until total DNA extraction. In next step, 500mg of each soil sample was weighed out for extraction.

At-harvest soil sampling: Within each plot, one of the central 4 plants was chosen for soil sampling. Surface residues on the soil including leaf litter and other organic debris were swept away. Soil samples were taken at a distance of 5-10 cm horizontally from the stem crown, and 5-20 cm below the ground surface, using a clean shovel. The willow roots were included in soil samples. For each soil sample, it was manually mixed thoroughly in sterile bag and filled a 50mL falcon tube (the rest was discarded), labelled and transffered back to lab on dry ice. Then in laboratory, plant roots within each sample were picked out of soil and shaken vigorously to remove bulk soil. The remaining soil adhering to roots, which was considered as rhizospheric soil, was transferred together with plant roots into a 2mL Eppendorf tube to obtain a mass of about 500mg. Afterwards, soil + fine roots samples were added with 978uL sodium phosphate solution (Na3PO4) from FastDNA® SPIN Kit for Soil (MP Biomedicals, Solon, OH, 44139, USA), and shaken with a speed of 200rpm for 30min to wash out the soil component from root. Plant roots were discarded, and the rhizosphere fraction slurry was used for the following DNA extraction step.

Soil DNA extraction and sequencing

The total soil DNA for each sample was extracted using FastDNA® SPIN Kit for Soil (MP Biomedicals, Solon, OH, 44139, USA). The DNA yield ranged from 1.13mg - 5.72mg per 500mg original soil sample. Extracted DNA samples were stored in -20°C freezer before sending for sequencing. Shotgun metagenomics sequencing were done on Illumina 2500 Hiseq instrument by the Penn State Genomics Core Facility at University Park and Singapore Centre for Environmental Life Sciences Engineering at the Nanyang Technological University. Paired-end 2 x 150bp shotgun metagenomic sequences of 130 libraries were generated. The sequence data generated in this study will be deposited in the NCBI Sequence Read Archive prior to publication.

63

Sequence processing and Data analysis

The raw fastq data were uploaded onto Metagenomics Rapid Annotation using Subsystem Technology (MG-RAST) server, and process through contaminant-filtering, quality trimming, de-replication, paired- end sequence merging and annotation against the nonredundant M5NR and M5RNA databases, which provides non-redundant integration of many databases (GenBank, SEED, IMG, UniProt, KEGG, eggNOGs, etc.). All taxonomic information from NCBI taxonomy database is projected against this data. The functional profiles are available for data sources that provide hierarchical information. At the end of this pipeline, both taxonomic and functional annotation profiles were generated. Rarefaction curve for taxonomic classification was generated on the MG-RAST server for pre-planting and at-harvest samples separately. Taxonomic identifications were retrieved from the Subsystems database and GO Functional classifications were retrived from the RefSeq database to generate taxonomic and functional composition profiles for each of the microbial communities in our study.

Microbiome comparison and statistical analysis

A rarefaction approach was applied to the metagenomics dataset, as subsampling the number of reads for each sample to obtain even numbers of reads, in order to have the same depth across all the microbiomes in this study. An outlier sample comprised of much less reads (less than 50% of the mean of other samples) was excluded and we rarify the read depth to the lowest level of reads by sample. The Shannon diversity index, as an indicator for the richness and evenness of taxa of each soil sample, was calculated using R package ‘vegan’ (Oksanen 2017). Bray-Curtis distance matrices of both taxonomic and functional dissimilarities among all samples were calculated using the ‘vegdist’ function of the R package ‘vegan’. The multivariate analysis as principal component analysis (PCoA) was performed based on Bray-Curtis distance using the function ‘cmdscale’ from the R package ‘vagan’. To quantify the effects of different environmental factors and experimental conditions on community composition, the permutational MANOVA (PERMANOVA) was performed with the function ‘adonis’ in the package ‘vegan’ on Bray- Curtis distance matrices of taxonomic dissimilarity, with a permutation number of 999. The association between multiple environmental variables and community composition were calculated on Calypso server (Zakrzewski 2017) as the Pearson's correlation between quantitative/binary explanatory variables and the relative abundance of microbial taxa, with the statistical threshold of p-value 0.05.

64

Tables

Table 3. 1 Results of permutational analysis of similarities (ANOSIM) tests using Bray-Curtis distances of the taxonomically annotated metagenomics data of each soil samples at the species level.

Category Degrees of Sums of Mean F. Model R2 Pr(>F) Level Freedom Squares Squares

Host 11 0.454 0.04127 1.2109 0.10102 0.031 *

Location 2 1.03 0.51498 15.1101 0.22921 0.001 ***

Host:Location 12 0.3854 0.03212 0.9423 0.08576 0.729

Residuals 77 2.6243 0.03408 0.58401

Total 102 4.4936 1

Asterisk (*) indicates the significance level of the p value – more asterisks means more significant

65

Table 3. 2 The 11 genera showing highest correlation with willow biomass yield thare known to include species reported in previous studies tobe plant growth promoting rhizobacteria.

Genus Plant growth promoting traits References

Mycobacterium IAA Tsavkelova et al. (2005)

Methylobacterium ACC deaminase, IAA, cytokinin, Akio et al. (2015) Siderophore, Nitrogen fixing, Bio-control Rhodococcus ACC deaminase, IAA and siderophores Belimov et al. (2005)

Frankia Nitrogen Fixing Richardson et al. (2009)

Rhodopseudomonas IAA Wong et al. (2014)

Streptomyces Suppress Fusarium Tokala et al. (2002)

Nitrobacter Nitrification Wrage et al. (2001)

Bradyrhizobium IAA, siderophores, HCN, ammonia Ahemad and Khan (2012a, b, c, d)

Caulobacter Nitrogen fixing Ahemad and Khan(2014)

Rhodospirillum Nitrogen fixing, cytokinin Saxena and Tilak (1998) Serdyuk et al. (2000) Sinorhizobium IAA, Nitrogen fixing Pandey et al. (2007) Delić-Vukmir et al. 1994

66

Table 3. 3 Comparison of most common genera (core microbiome) that were observed among willow rhizosphere microbiomes and pre-planting soil microbiomes. A. Shared genera between willow rhizosphere and pre-planting microbiomes; B. Unique genera to willow rhizosphere; C. Unique genera to pre-planting microbiomes.

Domain Phylum Class Order Family

A. Shared common genera between willow rhizosphere soil samples and pre-planting samples

Acidobacterium Bacteria Acidobacteria Acidobacteria (class) Acidobacteriales Acidobacteriaceae

Candidatus Koribacter Bacteria Acidobacteria unclassified (derived from unclassified (derived from unclassified (derived from Acidobacteria) Acidobacteria) Acidobacteria) Candidatus Solibacter Bacteria Acidobacteria Solibacteres Solibacterales Solibacteraceae

Terriglobus Bacteria Acidobacteria Acidobacteria (class) Acidobacteriales Acidobacteriaceae

Amycolatopsis Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Pseudonocardiaceae

Arthrobacter Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Micrococcaceae

Catenulispora Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Catenulisporaceae

Conexibacter Bacteria Actinobacteria Actinobacteria (class) Solirubrobacterales Conexibacteraceae

Corynebacterium Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Corynebacteriaceae

Frankia Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Frankiaceae

Geodermatophilus Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Geodermatophilaceae

Kribbella Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Nocardioidaceae

Micromonospora Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Micromonosporaceae

Mycobacterium Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Mycobacteriaceae

Nakamurella Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Nakamurellaceae

Rhodococcus Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Nocardiaceae

Saccharopolyspora Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Pseudonocardiaceae

Salinispora Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Micromonosporaceae

Streptomyces Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Streptomycetaceae

Streptosporangium Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Streptosporangiaceae

Bacteroides Bacteria Bacteroidetes Bacteroidia Bacteroidales Bacteroidaceae

Chitinophaga Bacteria Bacteroidetes Sphingobacteriales unclassified (derived from Sphingobacteriales) Flavobacterium Bacteria Bacteroidetes Flavobacteria Flavobacteriales Flavobacteriaceae

67

Mucilaginibacter Bacteria Bacteroidetes Sphingobacteria Sphingobacteriales Sphingobacteriaceae

Pedobacter Bacteria Bacteroidetes Sphingobacteria Sphingobacteriales Sphingobacteriaceae

Chlorobium Bacteria Chlorobi Chlorobia Chlorobiales Chlorobiaceae

Chloroflexus Bacteria Chloroflexi (class) Chloroflexales Chloroflexaceae

Ktedonobacter Bacteria Chloroflexi Ktedonobacteria Ktedonobacterales Ktedonobacteraceae

Roseiflexus Bacteria Chloroflexi Chloroflexi (class) Chloroflexales Chloroflexaceae

Cyanothece Bacteria Cyanobacteria unclassified (derived from Chroococcales unclassified (derived from Cyanobacteria) Chroococcales) unclassified (derived from unclassified (derived from Synechococcus Bacteria Cyanobacteria Chroococcales Cyanobacteria) Chroococcales) Bacillus Bacteria Firmicutes Bacilli Bacillales Bacillaceae

Clostridium Bacteria Firmicutes Clostridiales Clostridiaceae

Geobacillus Bacteria Firmicutes Bacilli Bacillales Bacillaceae

Paenibacillus Bacteria Firmicutes Bacilli Bacillales Paenibacillaceae

Gemmata Bacteria Planctomycetacia Planctomycetales Planctomycetaceae

Planctomyces Bacteria Planctomycetes Planctomycetacia Planctomycetales Planctomycetaceae

Achromobacter Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae

Acidiphilium Bacteria Proteobacteria Alphaproteobacteria Rhodospirillales Acetobacteraceae

Acidovorax Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae

Acinetobacter Bacteria Proteobacteria Gammaproteobacteria Pseudomonadales Moraxellaceae

Afipia Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Bradyrhizobiaceae

Agrobacterium Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Rhizobiaceae

Albidiferax Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae

Anaeromyxobacter Bacteria Proteobacteria Myxococcales Myxococcaceae

Azorhizobium Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Xanthobacteraceae

Azospirillum Bacteria Proteobacteria Alphaproteobacteria Rhodospirillales Rhodospirillaceae

Bordetella Bacteria Proteobacteria Betaproteobacteria Burkholderiales Alcaligenaceae

Bradyrhizobium Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Bradyrhizobiaceae

Burkholderia Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae

Caulobacter Bacteria Proteobacteria Alphaproteobacteria Caulobacterales Caulobacteraceae

Cupriavidus Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae

68

Dechloromonas Bacteria Proteobacteria Betaproteobacteria Rhodocyclales Rhodocyclaceae

Delftia Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae

Desulfovibrio Bacteria Proteobacteria Deltaproteobacteria Desulfovibrionales Desulfovibrionaceae

Geobacter Bacteria Proteobacteria Deltaproteobacteria Geobacteraceae

Haliangium Bacteria Proteobacteria Deltaproteobacteria Myxococcales Haliangiaceae

Labrenzia Bacteria Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae

Leptothrix Bacteria Proteobacteria Betaproteobacteria Burkholderiales unclassified (derived from Burkholderiales) Magnetospirillum Bacteria Proteobacteria Alphaproteobacteria Rhodospirillales Rhodospirillaceae

Mesorhizobium Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Phyllobacteriaceae

unclassified (derived from Methylibium Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiales)

Methylobacterium Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Methylobacteriaceae

Myxococcus Bacteria Proteobacteria Deltaproteobacteria Myxococcales Myxococcaceae

Nitrobacter Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Bradyrhizobiaceae

Nitrosococcus Bacteria Proteobacteria Gammaproteobacteria Chromatiales Chromatiaceae

Nitrosomonas Bacteria Proteobacteria Betaproteobacteria Nitrosomonadales Nitrosomonadaceae

Ochrobactrum Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Brucellaceae

Oligotropha Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Bradyrhizobiaceae

Pelobacter Bacteria Proteobacteria Deltaproteobacteria Desulfuromonadales Pelobacteraceae

Phenylobacterium Bacteria Proteobacteria Alphaproteobacteria Caulobacterales Caulobacteraceae

Polaromonas Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae

Pseudomonas Bacteria Proteobacteria Gammaproteobacteria Pseudomonadales Pseudomonadaceae

Ralstonia Bacteria Proteobacteria Betaproteobacteria Burkholderiales Burkholderiaceae

Rhizobium Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Rhizobiaceae

Rhodobacter Bacteria Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae

Rhodopseudomonas Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Bradyrhizobiaceae

Rhodospirillum Bacteria Proteobacteria Alphaproteobacteria Rhodospirillales Rhodospirillaceae

Roseobacter Bacteria Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae

Roseomonas Bacteria Proteobacteria Alphaproteobacteria Rhodospirillales Acetobacteraceae

Shewanella Bacteria Proteobacteria Gammaproteobacteria Alteromonadales Shewanellaceae

69

Sinorhizobium Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Rhizobiaceae

Sorangium Bacteria Proteobacteria Deltaproteobacteria Myxococcales Polyangiaceae

Stigmatella Bacteria Proteobacteria Deltaproteobacteria Myxococcales Cystobacteraceae unclassified (derived unclassified (derived from unclassified (derived from Bacteria Proteobacteria Deltaproteobacteria from Deltaproteobacteria) Deltaproteobacteria) Deltaproteobacteria) unclassified (derived unclassified (derived from unclassified (derived from from Bacteria Proteobacteria Gammaproteobacteria Gammaproteobacteria) Gammaproteobacteria) Gammaproteobacteria)

Variovorax Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae

Verminephrobacter Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae

Vibrio Bacteria Proteobacteria Gammaproteobacteria Vibrionales Vibrionaceae

Xanthobacter Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Xanthobacteraceae

Xanthomonas Bacteria Proteobacteria Gammaproteobacteria Xanthomonadales Xanthomonadaceae

Chthoniobacter Bacteria Verrucomicrobia Spartobacteria unclassified (derived from unclassified (derived from Spartobacteria) Spartobacteria) unclassified (derived from Opitutus Bacteria Verrucomicrobia Opitutae Opitutaceae Opitutae) unclassified (derived Bacteria Verrucomicrobia Verrucomicrobiae Verrucomicrobiales Verrucomicrobia from Verrucomicrobia subdivision 3 subdivision 3) Verrucomicrobium Bacteria Verrucomicrobia Verrucomicrobiae Verrucomicrobiales Verrucomicrobiaceae

B. Unique common genera from willow rhizosphere soil samples

Actinosynnema Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Actinosynnemataceae

Nocardioides Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Nocardioidaceae

Thermobispora Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Pseudonocardiaceae

Thermomonospora Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Thermomonosporaceae

Algoriphagus Bacteria Bacteroidetes Cytophagia Cytophagales Cyclobacteriaceae

Cytophaga Bacteria Bacteroidetes Cytophagia Cytophagales Cytophagaceae

Dyadobacter Bacteria Bacteroidetes Cytophagia Cytophagales Cytophagaceae

Prevotella Bacteria Bacteroidetes Bacteroidia Bacteroidales Prevotellaceae

Sphingobacterium Bacteria Bacteroidetes Sphingobacteria Sphingobacteriales Sphingobacteriaceae

Spirosoma Bacteria Bacteroidetes Cytophagia Cytophagales Cytophagaceae

Nostoc Bacteria Cyanobacteria unclassified (derived from Nostocales Nostocaceae Cyanobacteria)

70

Deinococcus- Deinococcus Bacteria Deinococci Deinococcales Deinococcaceae Thermus Caldicellulosiruptor Bacteria Firmicutes Clostridia Thermoanaerobacterales Family III. Incertae Sedis Desulfotomaculum Bacteria Firmicutes Clostridia Clostridiales Peptococcaceae

Lactobacillus Bacteria Firmicutes Bacilli Lactobacillales Lactobacillaceae

Thermaerobacter Bacteria Firmicutes Clostridia Clostridiales Clostridiales Family XVII. Incertae Sedis Bacteria Firmicutes Clostridia Thermoanaerobacterales Thermoanaerobacteraceae

Nitrospira Bacteria Nitrospira (class) Nitrospirales Nitrospiraceae

Pirellula Bacteria Planctomycetes Planctomycetacia Planctomycetales Planctomycetaceae

Aromatoleum Bacteria Proteobacteria Betaproteobacteria Rhodocyclales Rhodocyclaceae

Azoarcus Bacteria Proteobacteria Betaproteobacteria Rhodocyclales Rhodocyclaceae

Brevundimonas Bacteria Proteobacteria Alphaproteobacteria Caulobacterales Caulobacteraceae

Brucella Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Brucellaceae

Candidatus Bacteria Proteobacteria Betaproteobacteria unclassified (derived from unclassified (derived from Accumulibacter Betaproteobacteria) Betaproteobacteria) Chelativorans Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Phyllobacteriaceae

Chromobacterium Bacteria Proteobacteria Betaproteobacteria Neisseriales Neisseriaceae

Comamonas Bacteria Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae

Erythrobacter Bacteria Proteobacteria Alphaproteobacteria Sphingomonadales Erythrobacteraceae

Escherichia Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Enterobacteriaceae

Gluconacetobacter Bacteria Proteobacteria Alphaproteobacteria Rhodospirillales Acetobacteraceae

Herbaspirillum Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae

Herminiimonas Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae

Hyphomicrobium Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Hyphomicrobiaceae

Janthinobacterium Bacteria Proteobacteria Betaproteobacteria Burkholderiales Oxalobacteraceae

Legionella Bacteria Proteobacteria Gammaproteobacteria Legionellales Legionellaceae

Lutiella Bacteria Proteobacteria Betaproteobacteria Neisseriales Neisseriaceae

Marinobacter Bacteria Proteobacteria Gammaproteobacteria Alteromonadales Alteromonadaceae

Methylobacillus Bacteria Proteobacteria Betaproteobacteria Methylophilales Methylophilaceae

Methylocella Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Beijerinckiaceae

Methylotenera Bacteria Proteobacteria Betaproteobacteria Methylophilales Methylophilaceae

71

Methylovorus Bacteria Proteobacteria Betaproteobacteria Methylophilales Methylophilaceae

Neisseria Bacteria Proteobacteria Betaproteobacteria Neisseriales Neisseriaceae

Nitrosospira Bacteria Proteobacteria Betaproteobacteria Nitrosomonadales Nitrosomonadaceae

Novosphingobium Bacteria Proteobacteria Alphaproteobacteria Sphingomonadales Sphingomonadaceae

Parvibaculum Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Phyllobacteriaceae

Ruegeria Bacteria Proteobacteria Alphaproteobacteria Rhodobacterales Rhodobacteraceae

Sideroxydans Bacteria Proteobacteria Betaproteobacteria Gallionellales Gallionellaceae

Sphingobium Bacteria Proteobacteria Alphaproteobacteria Sphingomonadales Sphingomonadaceae

Sphingomonas Bacteria Proteobacteria Alphaproteobacteria Sphingomonadales Sphingomonadaceae

Starkeya Bacteria Proteobacteria Alphaproteobacteria Rhizobiales Xanthobacteraceae

Stenotrophomonas Bacteria Proteobacteria Gammaproteobacteria Xanthomonadales Xanthomonadaceae

Thauera Bacteria Proteobacteria Betaproteobacteria Rhodocyclales Rhodocyclaceae

Thioalkalivibrio Bacteria Proteobacteria Gammaproteobacteria Chromatiales Ectothiorhodospiraceae

Thiobacillus Bacteria Proteobacteria Betaproteobacteria Hydrogenophilales Hydrogenophilaceae unclassified (derived unclassified (derived from unclassified (derived from from Bacteria Proteobacteria Alphaproteobacteria Alphaproteobacteria) Alphaproteobacteria) Alphaproteobacteria)

Yersinia Bacteria Proteobacteria Gammaproteobacteria Enterobacteriales Enterobacteriaceae

Leptospira Bacteria Spirochaetes (class) Spirochaetales Leptospiraceae

Akkermansia Bacteria Verrucomicrobia Verrucomicrobiae Verrucomicrobiales Verrucomicrobiaceae

Coraliomargarita Bacteria Verrucomicrobia Opitutae Puniceicoccales Puniceicoccaceae

Methylacidiphilum Bacteria Verrucomicrobia unclassified (derived from Methylacidiphilales Methylacidiphilaceae Verrucomicrobia) unclassified (derived unclassified (derived from Bacteria Verrucomicrobia Opitutae Opitutaceae from Opitutaceae) Opitutae) C. Unique common genera from pre-planting soil samples

Nocardia Bacteria Actinobacteria Actinobacteria (class) Actinomycetales Nocardiaceae

Desulfitobacterium Bacteria Firmicutes Clostridia Clostridiales Peptococcaceae

72

Figures

Figure 3. 1 Geographic map of locations of the three willow trials in the Northeastern United States in this study. Sample sites were distribute among 3 Northeastern states, indicated as yellow symbols (Fredonia site in New York state, triangle; Rock Springs site in Pennsylvania, circle; Mylan Park site, star).

73

Figure 3. 2 Principal coordinate plots for total metagenomic DNA data for all samples, generated using the Bray–Curtis distance on a) species level and b) functional level. Samples are colored for each geographic site: red is NY, green is PA and blue is WV; Squares are samples before planting willow and dots are willow rhizosphere soil samples at the harvest time after 2 or 3 years growth. a) The color of the dots indicates which geo-location a metagenome sample was collected. Note that locations grouped apart from each other at harvest, but before-planting metagenomic samples were distributed across the plot, and did not cluster by geo-locations.; b) At the functional level, a core set of GO functional categories of genes clustered together at harvest sample times, with the two well-established sites in PA and NY clustering together and the poorly established WV metagenomes grouped separately. Functional categories of genes in pre-planting metagenomic data did not cluster significantly.

74

Figure 3. 3 Boxplots displaying distributions of Shannon diversity indexes of each soil microbiome grouped by different geo-locations and collection time. Samples are colored for each geographic site: red is NY, green is PA and blue is WV. The y – axis represents the Shannon diversity index of microbial community of each sample.

75

Figure 3. 4 A bar chart representation of microbiome community composition profiles at the phyla level for all soil DNA samples, showing the relative abundance of phyla for each sample. Each column is a soil sample. The x – axis lists all the samples and y – axis shows the relative abundance for each phylum; each color represents a phylum. On the right side, samples collected before willow planted are order by geographic locations (from left to right: Fredonia-NY, Rock Springs-PA, Mylan Park-WV); On the left side are willow rhizosphere soil samples collected at the time of biomass harvest, years after willow growth, ordered by geographic locations (from left to right: Fredonia-NY, Rock Springs-PA, Mylan Park- WV). The major phyla, either across all 130 microbiomes or present a significantly higher proportion in on site rather than two other sites, are described in legend.

76

Figure 3. 5 Heatmap of relative abundances of core microbial genera that were observed in the shrub willow rhizosphere microbiome samples only at the time of harvest (vs. pre-planting). The x – axis lists all the unique genera of the core microbiomes at harvest time, organized taxonomically according to their phylum. The y-axis labels individual microbiome samples as rows for each of the 3 sites identified to the right as different colors: red is NY, Green is PA and blue is WV. The host genotype of each row (microbiome sample) is indicated by black bars in colored box to the right.

77

Figure 3. 6 Heatmap of the relative abundances of each of the gene GO functional attributes (at functional annotation KEGG pathway) across all shrub willow rhizosphere microbiomes. Y-axis labels each of the KEGG pathways as a row. Columns represent each individual soil metagenome with X-axis labels above showing field site locations. Order of columns is arranged according to cluster analysis results comparing of overall metagenome sequence composition. Note that metagenome data did not cluster strictly by location and time of sampling.

78

Figure 3. 7 Heatmap of Pearson’s correlation values comparing abundances of the top 100 genera (x axis) with the twelve host genotypes and 2 geographic locations (Rock Springs and Fredonia) as well as willow biomass yield as genetic or environmental variables. Positive correlations are shown in red, negative correlations in blue, no correlation is yellow.

79

Supplementary tables and figures

Table S3. 1 Pedigree metadata of the twelve willow genotypes in this study.

Clone ID Epithet Species Mother Father Gender Diversity Ploidy group 94006 S. purpurea F 6a 2X

01X-271-009 S. viminalis x S. miyabeana SV7 9970-037 F 8 3X

02X-326-015 (S. miyabeana x S. 9970-21 Olof M 3X viminalis) x (S. schwerinii x. S. viminalis)

05X-291-049 S. purpurea x S. miyabeana 00-01-088 SX67 F 9 3X

9882-34 Fish Creek S. purpurea 94006 94001 M 6a 2X

99202-004 Fabius S. viminalis x S. miyabeana SV2 SX67 F 8 3X

99217-015 Millbrook S. purpurea x S. miyabeana 95026 SX64 F 9 3X

99201-007 Otisco S. viminalis x S. miyabeana SV2 SX64 F 8 3X

SX61 S. miyabeana F 5 4X

99217-023 Saratoga S. purpurea x S. miyabeana 95026 SX64 F 9 3X

LA970253 S. viminalis x S. miyabeana F 8 3X

01X-268-015 Preble S. viminalis x S. miyabeana SV2 9970-037 F 8 3X

80

A Rock Springs, PA B Fredonia, NY N C Mylan Park N

2012-2015 N 2013-2015 2014-?

India

Preble

Fabius

Sheridan

05X-295-014 02X-326-010 05X-295-015 01X-265-019 05X-291-049

LA970253

Millbrook Dimitrios India Saratoga

05X-291-049 02X-326-010 02X-326-015 Tully Champion Tully 89 90 91 92 93 94 95 96 1 2 3 4 5 6 4 3 2 1

Rep 1

SX61

94006 Otisco 94006

Dimitrios

05X-291-049 05X-281-068 05X-293-047 05X-281-068 05X-281-043

Otisco Canastota LA970253 Fabius 94006

Fish Creek Fish 01X-265-019 01X-271-009

81 82 83 84 85 86 87 88 Rep 4 7 8 9 10 11 12 Rep 1 8 7 6 5

SX61

Marcy

Preble

Millbrook

Saratoga

Fish Creek Fish 02X-326-015 01X-271-009 01X-271-009

LA970253

SX61 Sheridan Klara Preble Stina

05X-291-050 05X-281-043 05X-281-068

73 74 75 76 77 78 79 80 13 14 15 16 17 18 12 11 10 9

SX61

Preble

Fabius

Millbrook

01X-271-009 05X-281-043 05X-291-050 05X-281-068 05X-291-049

LA970253

SX61 Otisco Saratoga India Fabius Klara Preble 05X-291-050

65 66 67 68 69 70 71 72 19 20 21 22 23 24 16 15 14 13 Rep 2

Marcy

94006

Preble

Fabius Fabius

Dimitrios Millbrook

05X-281-043 05X-281-043

LA970253

94006 Dimitrios Canastota Stina

Fish Creek Fish 01X-265-019 02X-326-010 02X-326-015 57 58 59 60 61 62 63 64 25 26 27 28 29 30 20 19 18 17

Rep 3

SX61

94006

Preble

Fabius

Sheridan

Saratoga

05X-281-068 05X-295-015 01X-265-019

LA970253

LA970253 Millbrook Sheridan

05X-291-049 05X-281-043 05X-281-068 01X-271-009 Tully Champion Tully

49 50 51 52 53 54 55 56 31 32 33 34 35 36 Rep 2 24 23 22 21 Rep 3

SX61

Otisco 94006

Millbrook

Fish Creek Fish 05X-291-049 01X-271-009 05X-295-014 05X-281-043 05X-291-049

Dimitrios Saratoga Canastota Sheridan

05X-291-050 02X-326-015 05X-281-068 02X-326-010

41 42 43 44 45 46 47 48 37 38 39 40 41 42 28 27 26 25

India

Millbrook

02X-326-015 02X-326-010 05X-291-050 05X-293-047 05X-281-068 01X-271-009 05X-281-068

LA970253

Klara India LA970253 Fabius Otisco

Fish Creek Fish 01X-265-019 05X-291-049

33 34 35 36 37 38 39 40 Rep 2 43 44 45 46 47 48 32 31 30 29

Sheridan Millbrook

Fish Creek Fish 05X-293-047 05X-295-014 05X-281-043 05X-295-015 05X-281-043 05X-291-049 01X-271-009

Millbrook 94006 Stina Preble SX61

05X-281-043 01X-271-009 Tully Champion Tully 25 26 27 28 29 30 31 32 49 50 51 52 53 54 36 35 34 33

Rep 4

SX61

94006

Preble

Fabius

Dimitrios

02X-326-015 05X-291-050 05X-281-068 01X-265-019 05X-291-049

Otisco Millbrook SX61 94006 Canastota Preble

05X-281-043 05X-291-049

17 18 19 20 21 22 23 24 55 56 57 58 59 60 Rep 3 40 39 38 37

India

94006

Fabius

01X-271-009 02X-326-010

LA970253

Fabius Dimitrios Stina

Fish Creek Fish 02X-326-010 02X-326-015 05X-281-068 01X-271-009

9 10 11 12 13 14 15 16 Rep 1 61 62 63 64 65 66

SX61

Marcy

Otisco

Preble

Millbrook

Saratoga

Klara Sheridan India Saratoga LA970253

05X-291-050 01X-265-019 Tully Champion Tully

1 2 3 4 5 6 7 8 67 68 69 70 71 72

India

Otisco

05X-291-050 02X-326-015 01X-265-019 LA970253

73 74 75 76 77 78

94006

Sheridan

Saratoga

05X-295-015 02X-326-010 05X-295-014

79 80 81 82 83 84 Rep 4

Preble

Dimitrios Millbrook

Fish Creek Fish 05X-291-049 05X-281-043

85 86 87 88 89 90

SX61

Marcy

Fabius

05X-293-047 05X-281-068 01X-271-009 91 92 93 94 95 96

Figure S3. 1 Field Trial planting designs for Rock Springs (Panel A), Fredonia (Panel B) and Mylan Park (Panel C).

81

Figure S3. 2 Bar plot of willow biomass yield of each of the 12 willow cultivars on 2 different geographic sites. There was a large variation on yield ranking among the same cultivars between different sites. Yellow indicates Rock Springs and blue indicates Fredonia.

82

A.

B.

Figure S3. 3 Taxonomic rarefaction curves for A) willow rhizosphere soil samples and B) pre-planting soil communities, based on taxonomic annotation against the integrated nonredundant M5NR and M5RNA databases. All willow rhizosphere soil samples but one (RS01X_09R3: Rock Spring and Willow genotype 01X-015-009) reaches plateau stages on rarefaction curves, indicating sequencing depth was large enough to capture the diversity of each microbiome community. The one sample which doesn’t reach plateau on rarefaction curve was considered non-comparable with other samples and excluded from the following analysis.

83

Figure S3. 4 Boxplot of distribution of Shannon diversity indexes of each willow rhizosphere soil microbiome grouped by different geo-locations and host genotype. All samples in this plot were collected at the harvest time from willow rhizosphere region. Samples are colored for each geographic site: red is Fredonia, NY, green is Rock Springs, PA and blue is Maylan Park, WV.

84

Figure S3. 5 Top 100 biomarker candidates that distinguished on relative abundance between Fredonia (Right) and Rock Springs (Left).

85

Figure S3. 6 A bar chart representation of microbiome community composition profiles at the family level for all samples. Each column is a soil sample. The x – axis lists all the samples (from top to bottom: Mylan Park_pre-planting; Rock Springs_pre-planting; Fredonia_pre-planting; Mylan Park_at-harvest; Rock Springs_ at-harvest; Fredonia_ at-harvest). y – axis shows the relative abundance for each family within a microbiome.

86

87

CHAPTER 4

Summary and Future Prospects

88

Willows show great promise as a feedstock for bioenergy industry. They can be easily propagated by cuttings and established for short rotation coppice plantation. On the other hand, they grow fast with high biomass yield and require low inputs of management and maintenance. Moreover, willow adapt to non- arable areas, which makes no competition with traditional agriculture and improves the land condition after cultivation. These advantages make willow among the most potent source for sustainable biomass feedstocks. Over decades of years, studies on willows have enriched our knowledge of their genetic and genomic background. However, willows are genetically diverse and underexploited. Currently, the yield increase and pest/disease resistance on willows are primarily achieved by the conventional breeding approach and genetically mapped traits and features are still limited, which restricted the development of breeding new willow cultivars with improved performance on various aspects. In addition, willows also show the great phenotypic plasticity when grown under various environmental conditions, which differs in climate situation, water availability, temperate and etc. The phenotypic plasticity can impact plant fitness and productivity. Identification the phenotypic plasticity and linking the willow phenomics outside field with genomic information may help understand the genetic regulation of key breeding traits and optimize the selection of best-suited cultivars for different geographic locations and environmental conditions. How to extend our understandings on genotype-by-environment interactions between shrub willow and environmental factors and moreover the genotype-phenotype association via genetic methodology and genomic technology will be the goal of future willow study.

As the sequencing technology advances and the new version of Salix genome sequence and annotation become available, we foresee the genomics study on shrub willow and its interaction with the environment will drive our understanding and deployment of valuable traits for willow breeding. The maturing long-read RNA-Seq technology is superior to the short reads in many aspects. For transcriptome analysis, the long RNA read sequence will largely improve the accuracy of unambiguously mapping to reference and mitigate the bias of multi-mapping, also provide much more information about RNA-Seq alternative splicing, characterizing the complex RNA biosynthesis and dynamics. In addition, long-read RNA-Seq will also promote our study on SNP and antisense transcript detection, transcriptome assembly, as well as allele-specific gene expression on shrub willow (Zhao et al., 2016). By long-read RNA sequencing, we can anticipate an more complete and detailed picture describing the willow transcriptome.

Nowadays, as there is increasing evidence showing microbes or their metabolites can be used to enhance nutrient uptake and improve yield, control disease and pests, and alleviate plant stress, the plant- associated microbiomes have been brought to light with great potential to improve plant resilience under stress and increase crop yields via breeding ‘microbe-optimized plants’ (Trivedi et al, 2017). In our study, 11 genera were highlighted with their high correlations with biomass yield and they displayed validated

89

plant growth promoting traits in precedent researches on other plants. To further test their roles in willow growth, in vivo models of inoculation of the 11 genera into willow rhizosphere, both individually or combined together, will be constructed. Moreover, besides phenotypic observation as a direct reflection of microbiome effects, application of dual RNA-Seq for both rhizosphere metatranscriptome and plant root (as well as other vegetative tissues essential for biomass production) will expand the scope of capturing plant and microbiome transcriptomes simultaneously, giving us great insight to plant host – soil microbiome interactions.

On the other hand, within any microbial communities, each individual microbe are entangled in a very complex network, acting in synergy or antagonism. Also, the functional redundancy was found in various microbiota including human gut microbiota and soil ecosystem, suggesting that on the functional level, species are somewhat interchangeable within a microbial community. Thus, when come to understanding the role played by microbiome in host plant function, we’d better view a microbiome as a whole with a super huge genome. Thereby, network theory, which was previously used to analyze the proteomics and transcriptomics within an organism or even a single cell, will show its applicability in modeling the complex and multifaceted microbiota system (Layeghifard et al., 2016).

90

Reference

Abrahamson, L., Volk, T., Smart, L., & Cameron, K. (2010). Shrub willow biomass producer’s handbook. College of Environmental Science and Forestry, Syracuse, NY, USA. Pozyskano z: http://www. esf. edu/willow/documents/ProducersHandbook. pdf.

Ahemad, M., & Khan, M. (2010a). Phosphate solubilizing Enterobacter asburiae strain PS2. Afr J Microbiol Res, 5, 849-857.

Ahemad, M., & Khan, M. S. (2010b). Comparative toxicity of selected insecticides to pea plants and growth promotion in response to insecticide-tolerant and plant growth promoting Rhizobium leguminosarum. Crop protection, 29(4), 325-329.

Ahemad, M., & Khan, M. S. (2010c). Phosphate-solubilizing and plant-growth-promoting Pseudomonas aeruginosa PS1 improves greengram performance in quizalafop-p-ethyl and clodinafop amended soil. Archives of environmental contamination and toxicology, 58(2), 361-372.

Ahemad, M., & Khan, M. S. (2010d). Plant growth promoting activities of phosphate-solubilizing Enterobacter asburiae as influenced by fungicides. EurAsian J BioSci, 4, 88-95.

Ahemad, M., & Kibret, M. (2014). Mechanisms and applications of plant growth promoting rhizobacteria: current perspective. Journal of King Saud University-Science, 26(1), 1-20.

Aleklett, K., Leff, J. W., Fierer, N., & Hart, M. (2015). Wild plant species growing closely connected in a subalpine meadow host distinct root-associated bacterial communities. PeerJ, 3, e804.

Apine, O., & Jadhav, J. (2011). Optimization of medium for indole‐3‐acetic acid production using Pantoea agglomerans strain PVM. Journal of applied microbiology, 110(5), 1235-1244.

Argus, G. W. (1997). Infrageneric classification of Salix (Salicaceae) in the new world. Systematic botany monographs, 1-121.

Argus, G. W. (1999). Classification of Salix in the new world. Botanical Electronic News, 227, 1-6.

Argus G. W. (2010). Salix L. In: Flora of North America Editorial Committee (ed) Flora of North America North of Mexico, Volume 7. Magnoliophyta: Salicaceae to Brassicaceae. Oxford University Press, Oxford, 23–162

Backus, E. A., & Hunter, W. B. (1989). Comparison of feeding behavior of the potato leafhopper Empoasca fabae (Homoptera: Cicadellidae) on alfalfa and broad bean leaves. Environmental Entomology, 18(3), 473-480.

Backus, E. A., Serrano, M. S., & Ranger, C. M. (2005). Mechanisms of hopperburn: an overview of insect taxonomy, behavior, and physiology. Annu. Rev. Entomol., 50, 125-151.

Badri, D. V., Chaparro, J. M., Zhang, R., Shen, Q., & Vivanco, J. M. (2013). Application of natural blends of phytochemicals derived from the root exudates of Arabidopsis to the soil reveal that

91

phenolic-related compounds predominantly modulate the soil microbiome. Journal of Biological Chemistry, 288(7), 4502-4512.

Bais, H. P., Weir, T. L., Perry, L. G., Gilroy, S., & Vivanco, J. M. (2006). The role of root exudates in rhizosphere interactions with plants and other organisms. Annu. Rev. Plant Biol., 57, 233-266.

Belimov, A., Hontzeas, N., Safronova, V., Demchinskaya, S., Piluzza, G., Bullitta, S., & Glick, B. (2005). Cadmium-tolerant plant growth-promoting bacteria associated with the roots of Indian mustard (Brassica juncea L. Czern.). Soil Biology and Biochemistry, 37(2), 241-250.

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the royal statistical society. Series B (Methodological), 289-300.

Berg, G. (2009). Plant–microbe interactions promoting plant growth and health: perspectives for controlled use of microorganisms in agriculture. Applied microbiology and biotechnology, 84(1), 11-18.

Berlin, S., Ghelardini, L., Bonosi, L., Weih, M., & Rönnberg-Wästljung, A. C. (2014). QTL mapping of biomass and nitrogen economy traits in willows (Salix spp.) grown under contrasting water and nutrient conditions. Molecular breeding, 34(4), 1987-2003.

Bouwmeester, H. J., Roux, C., Lopez-Raez, J. A., & Becard, G. (2007). Rhizosphere communication of plants, parasitic plants and AM fungi. Trends in plant science, 12(5), 224-230.

Brusetti, L., Francia, P., Bertolini, C., Pagliuca, A., Borin, S., Sorlini, C., . . . Giovannetti, L. (2005). Bacterial communities associated with the rhizosphere of transgenic Bt 176 maize (Zea mays) and its non transgenic counterpart. Plant and Soil, 266(1-2), 11-21.

Buchholz, T., & Volk, T. (2011). Identifying opportunities to improve the profitability of willow biomass crops with a crop budget model. Bioenergy Res, 4(2), 85-95.

Bulgarelli, D., Garrido-Oter, R., Münch, P. C., Weiman, A., Dröge, J., Pan, Y., . . . Schulze-Lefert, P. (2015). Structure and function of the bacterial root microbiota in wild and domesticated barley. Cell host & microbe, 17(3), 392-403.

Bulgarelli, D., Rott, M., Schlaeppi, K., van Themaat, E. V. L., Ahmadinejad, N., Assenza, F., . . . Schmelzer, E. (2012). Revealing structure and assembly cues for Arabidopsis root-inhabiting bacterial microbiota. Nature, 488(7409), 91-95.

Bulgarelli, D., Schlaeppi, K., Spaepen, S., van Themaat, E. V. L., & Schulze-Lefert, P. (2013). Structure and functions of the bacterial microbiota of plants. Annual review of plant biology, 64, 807-838.

Burke, C., Steinberg, P., Rusch, D., Kjelleberg, S., & Thomas, T. (2011). Bacterial community assembly based on functional genes rather than species. Proceedings of the National Academy of Sciences, 108(34), 14288-14293.

92

Cameron, K. D., Phillips, I. S., Kopp, R. F., Volk, T. A., Maynard, C. A., Abrahamson, L. P., & Smart, L. B. (2008). Quantitative Genetics of Traits Indicative of Biomass Production and Heterosis in 34 Full-sib F1Salix eriocephala Families. Bioenergy Research, 1(1), 80-90.

Carlson, C. H., Choi, Y., Chan, A. P., Serapiglia, M. J., Town, C. D., & Smart, L. B. (2017). Dominance and sexual dimorphism pervade the Salix purpurea L. transcriptome. Genome biology and evolution, 9(9), 2377-2394.

Chasen, E. M., Dietrich, C., Backus, E. A., & Cullen, E. M. (2014). Potato leafhopper (Hemiptera: Cicadellidae) ecology and integrated pest management focused on alfalfa. Journal of Integrated Pest Management, 5(1), A1-A8.

Dai, X., Hu, Q., Cai, Q., Feng, K., Ye, N., Tuskan, G. A., . . . Wang, Z. (2014). The willow genome and divergent evolution from poplar after the common genome duplication. Cell research, 24(10), 1274.

Dar, G. H. (2009). Soil microbiology and biochemistry: New India Publishing.

Delić, D., Stajković-Srbinović, O., Kuzmanović, D., Rasulić, N., Maksimović, S., Radović, J., & Simić, A. (2013). Influence of plant growth promoting rhizobacteria on alfalfa, Medicago sativa L. yield by inoculation of a preceding Italian ryegrass, Lolium multiflorum Lam. In Breeding strategies for sustainable forage and turf grass improvement(pp. 333-339): Springer.

Desgarennes, D., Garrido, E., Torres-Gomez, M. J., Pena-Cabriales, J. J., & Partida-Martinez, L. P. (2014). Diazotrophic potential among bacterial communities associated with wild and cultivated Agave species. FEMS microbiology ecology, 90(3), 844-857.

De Vrije, T., Antoine, N., Buitelaar, R., Bruckner, S., Dissevelt, M., Durand, A., . . . Oostra, J. (2001). The fungal biocontrol agent Coniothyrium minitans: production by solid-state fermentation, application and marketing. Applied microbiology and biotechnology, 56(1-2), 58-68.

Dimitriou, I., Aronsson, P., & Weih, M. (2006). Stress tolerance of five willow clones after irrigation with different amounts of landfill leachate. Bioresource technology, 97(1), 150-157.

Doty, S. L., Dosher, M. R., Singleton, G. L., Moore, A. L., Van Aken, B., Stettler, R. F., . . . Gordon, M. P. (2005). Identification of an endophytic Rhizobium in stems of Populus. Symbiosis, 39(1), 27- 35.

Doty, S. L., Oakley, B., Xin, G., Kang, J. W., Singleton, G., Khan, Z., . . . Staley, J. T. (2009). Diazotrophic endophytes of native black cottonwood and willow. Symbiosis, 47(1), 23-33.

Du, Z., Zhou, X., Ling, Y., Zhang, Z., & Su, Z. (2010). agriGO: a GO analysis toolkit for the agricultural community. Nucleic acids research, 38(suppl_2), W64-W70.

Ecale, C. L., & Backus, E. A. (1995a). Mechanical and salivary aspects of potato leafhopper probing in alfalfa stems. Entomologia experimentalis et applicata, 77(2), 121-132.

93

Ecale, C. L., & Backus, E. A. (1995b). Time course of anatomical changes to stem vascular tissues of alfalfa, Medicago sativa, from probing injury by the potato leafhopper, Empoasca fabae. Canadian Journal of Botany, 73(2), 288-298.

Eisenhauer, N., Milcu, A., Sabais, A. C., Bessler, H., Brenner, J., Engels, C., . . . Roscher, C. (2011). Plant diversity surpasses plant functional groups and plant productivity as driver of soil biota in the long term. PLoS One, 6(1), e16055.

Elshire, R. J., Glaubitz, J. C., Sun, Q., Poland, J. A., Kawamoto, K., Buckler, E. S., & Mitchell, S. E. (2011). A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PloS one, 6(5), e19379.

Etzler, M. E., & Esko, J. D. (2009). Free glycans as signaling molecules.

Fabio, E. S., Kemanian, A. R., Montes, F., Miller, R. O., & Smart, L. B. (2017). A mixed model approach for evaluating yield improvements in interspecific hybrids of shrub willow, a dedicated bioenergy crop. Industrial Crops and Products, 96, 57-70.

Fabio, E. S., Volk, T. A., Miller, R. O., Serapiglia, M. J., Gauch, H. G., Van Rees, K. C., . . . Labrecque, M. (2017). Genotype× environment interaction analysis of North American shrub willow yield trials confirms superior performance of triploid hybrids. Gcb Bioenergy, 9(2), 445-459.

Faoro, H., Alves, A., Souza, E., Rigo, L., Cruz, L., Al-Janabi, S., . . . Pedrosa, F. (2010). Influence of soil characteristics on the diversity of bacteria in the Southern Brazilian Atlantic Forest. Applied and environmental microbiology, 76(14), 4744-4749.

Fierer, N., Bradford, M. A., & Jackson, R. B. (2007). Toward an ecological classification of soil bacteria. Ecology, 88(6), 1354-1364.

Fierer, N., & Jackson, R. B. (2006). The diversity and biogeography of soil bacterial communities. Proceedings of the National Academy of Sciences of the United States of America, 103(3), 626-631.

Fierer, N., Leff, J. W., Adams, B. J., Nielsen, U. N., Bates, S. T., Lauber, C. L., . . . Caporaso, J. G. (2012). Cross-biome metagenomic analyses of soil microbial communities and their functional attributes. Proceedings of the National Academy of Sciences, 109(52), 21390-21395.

Fogelqvist, J., Verkhozina, A. V., Katyshev, A. I., Pucholt, P., Dixelius, C., Rönnberg-Wästljung, A. C., . . . Berlin, S. (2015a). Genetic and morphological evidence for introgression between three species of willows. BMC evolutionary biology, 15(1), 193.

Fogelqvist, J., Verkhozina, A. V., Katyshev, A. I., Pucholt, P., Dixelius, C., Rönnberg-Wästljung, A. C., . . . Berlin, S. (2015b). Genetic and morphological evidence for introgression between three species of willows. BMC evolutionary biology, 15(1), 193.

Franken, P. (2012). The plant strengthening root endophyte Piriformospora indica: potential application and the biology behind. Applied microbiology and biotechnology, 96(6), 1455-1464.

94

Frey, S. D., Knorr, M., Parrent, J. L., & Simpson, R. T. (2004). Chronic nitrogen enrichment affects the structure and function of the soil microbial community in temperate hardwood and pine forests. Forest Ecology and Management, 196(1), 159-171.

Fulton, L. M., Lynd, L. R., Körner, A., Greene, N., & Tonachel, L. R. (2015). The need for biofuels as part of a low carbon energy future. Biofuels, Bioproducts and Biorefining, 9(5), 476-483.

Girvan, M. S., Bullimore, J., Pretty, J. N., Osborn, A. M., & Ball, A. S. (2003). Soil type is the primary determinant of the composition of the total and active bacterial communities in arable soils. Applied and Environmental Microbiology, 69(3), 1800-1809.

Glick, B. R. (2012). Plant growth-promoting bacteria: mechanisms and applications. Scientifica, 2012.

Gorbunova, V., & Levy, A. A. (1997). Non-homologous DNA end joining in plant cells is associated with deletions and filler DNA insertions. Nucleic Acids Research, 25(22), 4650-4657.

Gupta, A., Saxena, A., Gopal, M., & Tilak, K. (1998). Effect of plant growth promoting rhizobacteria on competitive ability of introduced Bradyrhizobium sp.(Vigna) for nodulation. Microbiological research, 153(2), 113-117.

Han, K., & Lincoln, D. E. (1994). The evolution of carbon allocation to plant secondary metabolites: a genetic analysis of cost in Diplacus aurantiacus. Evolution, 48(5), 1550-1563.

Hanley, S., Barker, J., Van Ooijen, J., Aldam, C., Harris, S., Åhman, I., . . . Karp, A. (2002). A genetic linkage map of willow (Salix viminalis) based on AFLP and microsatellite markers. TAG Theoretical and Applied Genetics, 105(6), 1087-1096.

Hanley, S., Mallott, M., & Karp, A. (2006). Alignment of a Salix linkage map to the Populus genomic sequence reveals macrosynteny between willow and poplar genomes. Tree Genetics & Genomes, 3(1), 35-48.

Hanley, S. J. (2003). Genetic mapping of important agronomic traits in biomass willow. University of Bristol,

Hanley, S. J., & Karp, A. (2013). Genetic strategies for dissecting complex traits in biomass willows (Salix spp.). Tree physiology, 34(11), 1167-1180.

Harris, N., Hagen, S., Saatchi, S., Pearson, T., Woodall, C. W., Domke, G. M., . . . Salas, W. (2016). Attribution of net carbon change by disturbance type across forest lands of the conterminous United States. Carbon balance and management, 11(1), 24.

Heaton, E. A., Dohleman, F. G., & Long, S. P. (2008). Meeting US biofuel goals with less land: the potential of Miscanthus. Global change biology, 14(9), 2000-2014.

Heldt, H., & Piechulla, B. (2011). Sulfate assimilation enables the synthesis of sulfur containing compounds. Plant Biochemistry. In: San Diego: Academic Press.

Heldt, H.-W., & Piechulla, B. (2004). Plant biochemistry: Academic Press.

95

Herms, D. A., & Mattson, W. J. (1992). The dilemma of plants: to grow or defend. The quarterly review of biology, 67(3), 283-335.

Hooper, D. U., Bignell, D. E., Brown, V. K., Brussard, L., Dangerfield, J. M., Wall, D. H., . . . Lavelle, P. (2000). Interactions between Aboveground and Belowground Biodiversity in Terrestrial Ecosystems: Patterns, Mechanisms, and Feedbacks: We assess the evidence for correlation between aboveground and belowground diversity and conclude that a variety of mechanisms could lead to positive, negative, or no relationship—depending on the strength and type of interactions among species. AIBS Bulletin, 50(12), 1049-1061.

Hu, J.-J., Lv, J.-H., & Lu, M.-Z. (2011). Genetic linkage map of willow (Salix leucopithecia× S. erioclada L) based on AFLP and SSR markers. Paper presented at the BMC Proceedings.

Hulsen, T., de Vlieg, J., & Alkema, W. (2008). BioVenn–a web application for the comparison and visualization of biological lists using area-proportional Venn diagrams. BMC genomics, 9(1), 488.

Huseth, A. S., Groves, R. L., Chapman, S. A., Alyokhin, A., Kuhar, T. P., Macrae, I. V., . . . Nault, B. A. (2014). Managing colorado potato beetle insecticide resistance: new tools and strategies for the next decade of pest control in potato. Journal of Integrated Pest Management, 5(4), A1-A8.

Jin, J., He, K., Tang, X., Li, Z., Lv, L., Zhao, Y., . . . Gao, G. (2015). An Arabidopsis transcriptional regulatory map reveals distinct functional and evolutionary features of novel transcription factors. Molecular biology and evolution, 32(7), 1767-1773.

Jones, D. L., Nguyen, C., & Finlay, R. D. (2009). Carbon flow in the rhizosphere: carbon trading at the soil–root interface. Plant and Soil, 321(1-2), 5-33.

Kabrick, L. R., & Backus, E. A. (1990). Salivary deposits and plant damage associated with specific probing behaviors of the potato leafhopper, Empoasca fabae, on alfalfa stems. Entomologia Experimentalis et Applicata, 56(3), 287-304.

Karban, R., Agrawal, A. A., & Mangel, M. (1997). The benefits of induced defenses against herbivores. Ecology, 78(5), 1351-1355.

Karp, A. (2014). Willows as a source of renewable fuels and diverse products. In Challenges and Opportunities for the World's Forests in the 21st Century(pp. 617-641): Springer.

Karp, A., Hanley, S. J., Trybush, S. O., Macalpine, W., Pei, M., & Shield, I. (2011a). Genetic improvement of willow for bioenergy and biofuels. Journal of integrative plant biology, 53(2), 151-165.

Karp, A., Hanley, S. J., Trybush, S. O., Macalpine, W., Pei, M., & Shield, I. (2011b). Genetic improvement of willow for bioenergy and biofuels. Journal of integrative plant biology, 53(2), 151-165.

Karp, A., & Shield, I. (2008). Bioenergy from plants and the sustainable yield challenge. New Phytologist, 179(1), 15-32.

96

Kempel, A., Schädler, M., Chrobock, T., Fischer, M., & van Kleunen, M. (2011). Tradeoffs associated with constitutive and induced plant resistance against herbivory. Proceedings of the National Academy of Sciences, 108(14), 5685-5689.

Kessler, A., & Baldwin, I. T. (2002). Plant responses to insect herbivory: the emerging molecular analysis. Annual review of plant biology, 53(1), 299-328.

Khasa, Y. P. (2017). Microbes as Biocontrol Agents. In Probiotics and Plant Health (pp. 507-552): Springer.

Kloepper, J. W., & Schroth, M. N. (1978). Plant growth-promoting rhizobacteria on radishes. Paper presented at the Proceedings of the 4th international conference on plant pathogenic bacteria.

Knoth, J. L., Kim, S. H., Ettl, G. J., & Doty, S. L. (2014). Biological nitrogen fixation and biomass accumulation within poplar clones as a result of inoculations with diazotrophic endophyte consortia. New Phytologist, 201(2), 599-609.

Koch, A. L. (2001). Oligotrophs versus copiotrophs. Bioessays, 23(7), 657-661.

Kohler, A., Rinaldi, C., Duplessis, S., Baucher, M., Geelen, D., Duchaussoy, F., . . . Martin, F. (2008). Genome-wide identification of NBS resistance genes in Populus trichocarpa. Plant molecular biology, 66(6), 619-636.

Kopp, R., Smart, L., Maynard, C., Isebrands, J., Tuskan, G., & Abrahamson, L. (2001). The development of improved willow clones for eastern North America. The Forestry Chronicle, 77(2), 287-292.

Kumar, A., Saini, P., & Shrivastava, J. (2009). Production of peptide antifungal antibiotic and biocontrol activity of Bacillus subtilis.

Kuske, C. R., Ticknor, L. O., Miller, M. E., Dunbar, J. M., Davis, J. A., Barns, S. M., & Belnap, J. (2002). Comparison of soil bacterial communities in rhizospheres of three plant species and the interspaces in an arid grassland. Applied and Environmental Microbiology, 68(4), 1854-1863.

Kuzovkina, Y. A., & Quigley, M. F. (2005). Willows beyond wetlands: uses of Salix L. species for environmental projects. Water, Air, and Soil Pollution, 162(1-4), 183-204.

Kuzovkina, Y. A., Weih, M., Romero, M. A., Charles, J., Hust, S., McIvor, I., . . . Teodorescu, T. I. (2008). Salix: botany and global horticulture. Horticultural Reviews, Volume 34, 447-489.

Labrecque, M., & Teodorescu, T. I. (2005). Field performance and biomass production of 12 willow and poplar clones in short-rotation coppice in southern Quebec (Canada). Biomass and Bioenergy, 29(1), 1-9.

Lamp, W. O., Nielsen, G. R., & Danielson, S. D. (1994). Patterns among host plants of potato leafhopper, Empoasca fabae (Homoptera: Cicadellidae). Journal of the Kansas Entomological Society, 354- 368.

Lamp, W. O., Nielsen, G. R., & Dively, G. P. (1991). Insect pest-induced losses in alfalfa: patterns in Maryland and implications for management. Journal of economic entomology, 84(2), 610-618.

97

Lander, E. S., & Botstein, D. (1989). Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics, 121(1), 185-199.

Langfelder, P., & Horvath, S. (2008). WGCNA: an R package for weighted correlation network analysis. BMC bioinformatics, 9(1), 559.

Langmead, B., & Salzberg, S. L. (2012). Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), 357-359.

Larsson, S. (1998). Genetic improvement of willow for short-rotation coppice. Biomass and Bioenergy, 15(1), 23-26.

Larsson, S., & Lindegaard, K. (2003). Full scale implementation of short rotation willow coppice SRC in Sweden. Orebro: Agrobransle AB.

Lata, H., Li, X., Silva, B., Moraes, R., & Halda-Alija, L. (2006). Identification of IAA-producing endophytic bacteria from micropropagated Echinacea plants using 16S rRNA sequencing. Plant cell, tissue and organ culture, 85(3), 353-359.

Lauber, C. L., Strickland, M. S., Bradford, M. A., & Fierer, N. (2008). The influence of soil properties on the structure of bacterial and fungal communities across land-use types. Soil Biology and Biochemistry, 40(9), 2407-2415.

Lauron-Moreau, A., Pitre, F. E., Brouillet, L., & Labrecque, M. (2013). Microsatellite markers of willow species and characterization of 11 polymorphic microsatellites for Salix eriocephala (Salicaceae), a potential native species for biomass production in Canada. Plants, 2(2), 203-210.

Layeghifard, M., Hwang, D. M., & Guttman, D. S. (2016). Disentangling interactions in the microbiome: a network perspective. Trends in microbiology.

Lee, S. J., Warnick, T. A., Pattathil, S., Alvelo-Maurosa, J. G., Serapiglia, M. J., McCormick, H., . . . Smart, L. B. (2012). Biological conversion assay using Clostridium phytofermentans to estimate plant feedstock quality. Biotechnology for biofuels, 5(1), 5.

Lewis, J. D., Lee, A. H.-Y., Hassan, J. A., Wan, J., Hurley, B., Jhingree, J. R., . . . Guttman, D. S. (2013). The Arabidopsis ZED1 pseudokinase is required for ZAR1-mediated immunity induced by the Pseudomonas syringae type III effector HopZ1a. Proceedings of the National Academy of Sciences, 110(46), 18722-18727.

Li, R., Tee, C.-S., Jiang, Y.-L., Jiang, X.-Y., Venkatesh, P. N., Sarojam, R., & Ye, J. (2015). A terpenoid phytoalexin plays a role in basal defense of Nicotiana benthamiana against Potato virus X. Scientific reports, 5.

Lindegaard, K., & Barker, J. (1997). Breeding willows for biomass. Aspects of Applied Biology, 49, 155- 162.

Love, M. I., Huber, W., & Anders, S. (2014). Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome biology, 15(12), 550.

98

Lozupone, C. A., Stombaugh, J. I., Gordon, J. I., Jansson, J. K., & Knight, R. (2012). Diversity, stability and resilience of the human gut microbiota. Nature, 489(7415), 220-230.

Madhaiyan, M., Poonguzhali, S., Senthilkumar, M., SESHADRI, S., Chung, H., Jinchul, Y., . . . Tongmin, S. (2004). Growth promotion and induction of systemic resistance in rice cultivar Co- 47 (Oryza sativa L.) by Methylobacterium spp. Botanical Bulletin of Academia Sinica, 45.

Marone, D., Russo, M. A., Laidò, G., De Leonardis, A. M., & Mastrangelo, A. M. (2013). Plant nucleotide binding site–leucine-rich repeat (NBS-LRR) genes: active guardians in host defense responses. International journal of molecular sciences, 14(4), 7302-7326.

Mauch-Mani, B., Baccelli, I., Luna, E., & Flors, V. (2017). Defense priming: an adaptive part of induced resistance. Annual Review of Plant Biology, 68, 485-512.

McHale, L., Tan, X., Koehl, P., & Michelmore, R. W. (2006). Plant NBS-LRR proteins: adaptable guards. Genome biology, 7(4), 212.

Meera, M., Shivanna, M., Kageyama, K., & Hyakumachi, M. (1994). Plant growth promoting fungi from zoysiagrass rhizosphere as potential inducers of systemic resistance in cucumbers. Phytopathology, 84(12), 1399-1406.

Merzaeva, O., & Shirokikh, I. (2010). The production of auxins by the endophytic bacteria of winter rye. Applied Biochemistry and Microbiology, 46(1), 44-50.

Meyer, F., Paarmann, D., D'Souza, M., Olson, R., Glass, E. M., Kubal, M., . . . Wilke, A. (2008). The metagenomics RAST server–a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC bioinformatics, 9(1), 386.

Millard, P., & Singh, B. K. (2010). Does grassland vegetation drive soil microbial diversity? Nutrient Cycling in Agroecosystems, 88(2), 147-158.

Mishra, P. K., Bisht, S. C., Mishra, S., Selvakumar, G., Bisht, J., & Gupta, H. (2012). Coinoculation of Rhizobium leguminosarum-PR1 with a cold tolerant Pseudomonas sp. improves iron acquisition, nutrient uptake and growth of field pea (Pisum sativum L.). Journal of plant nutrition, 35(2), 243-256.

Mitchell, R., Schmer, M., Anderson, W., Jin, V., Balkcom, K., Kiniry, J., . . . White, P. (2016). Dedicated energy crops and crop residues for bioenergy feedstocks in the Central and Eastern USA. BioEnergy Research, 9(2), 384-398.

Mitchell, R., Vogel, K. P., & Uden, D. R. (2012). The feasibility of switchgrass for biofuel production. Biofuels, 3(1), 47-59.

Moya, A., & Ferrer, M. (2016). Functional redundancy-induced stability of gut microbiota subjected to disturbance. Trends in microbiology, 24(5), 402-413.

Myles, S., Chia, J.-M., Hurwitz, B., Simon, C., Zhong, G. Y., Buckler, E., & Ware, D. (2010). Rapid genomic characterization of the genus vitis. PloS one, 5(1), e8219.

99

Naylor, D., DeGraaf, S., Purdom, E., & Coleman-Derr, D. (2017). Drought and host selection influence bacterial community dynamics in the grass root microbiome. The ISME journal, 11(12), 2691.

Nielsen, G. R., Lamp, W. O., & Stutte, G. W. (1990). Potato leafhopper (Homoptera: Cicadellidae) feeding disruption of phloem translocation in alfalfa. Journal of economic entomology, 83(3), 807-813.

Oksanen, J. (2015). Vegan: an introduction to ordination. URL https://cran.r- project.org/web/packages/vegan/vignettes/intro-vegan.pdf

Oldroyd, G. E., Murray, J. D., Poole, P. S., & Downie, J. A. (2011). The rules of engagement in the legume-rhizobial symbiosis. Annual review of genetics, 45, 119-144.

Openshaw, K. (2000). A review of Jatropha curcas: an oil plant of unfulfilled promise. Biomass and bioenergy, 19(1), 1-15.

Pacaldo, R. S., Volk, T. A., & Briggs, R. D. (2013). Greenhouse gas potentials of shrub willow biomass crops based on below-and aboveground biomass inventory along a 19-year chronosequence. BioEnergy Research, 6(1), 252-262.

Pandey, P., & Maheshwari, D. (2007). Two-species microbial consortium for growth promotion of Cajanus cajan. Current science, 1137-1142.

Pascual, M. B., El-Azaz, J., Fernando, N., Cañas, R. A., Avila, C., & Cánovas, F. M. (2016). Biosynthesis and metabolic fate of phenylalanine in conifers. Frontiers in plant science, 7.

Pinton, R., Varanini, Z., & Nannipieri, P. (2007). The rhizosphere: biochemistry and organic substances at the soil-plant interface: CRC press.

Prakash, T., & Taylor, T. D. (2012). Functional assignment of metagenomic data: challenges and applications. Briefings in bioinformatics, 13(6), 711-727.

Pucholt, P., Rönnberg-Wästljung, A.-C., & Berlin, S. (2015). Single locus sex determination and female heterogamety in the basket willow (Salix viminalis L.). Heredity, 114(6), 575-583.

Pucholt, P., Wright, A. E., Conze, L. L., Mank, J. E., & Berlin, S. (2017). Recent sex chromosome divergence despite ancient dioecy in the willow Salix viminalis. Molecular Biology and Evolution, msx144.

Pérez-Rodríguez, P., Riaño-Pachón, D. M., Corrêa, L. G. G., Rensing, S. A., Kersten, B., & Mueller- Roeber, B. (2009). PlnTFDB: updated content and new features of the plant transcription factor database. Nucleic acids research, 38(suppl_1), D822-D827.

Raaijmakers, J. M., Paulitz, T. C., Steinberg, C., Alabouvette, C., & Moënne-Loccoz, Y. (2009). The rhizosphere: a playground and battlefield for soilborne pathogens and beneficial microorganisms. Plant and soil, 321(1-2), 341-361.

100

Rasmann, S., & Agrawal, A. A. (2009). Plant defense against herbivory: progress in identifying synergism, redundancy, and antagonism between resistance traits. Current opinion in plant biology, 12(4), 473-478.

Ray, M. J., Brereton, N. J., Shield, I., Karp, A., & Murphy, R. J. (2012). Variation in cell wall composition and accessibility in relation to biofuel potential of short rotation coppice willows. Bioenergy Research, 5(3), 685-698.

Reichenauer, T. G., & Germida, J. J. (2008). Phytoremediation of organic contaminants in soil and groundwater. ChemSusChem, 1(8‐9), 708-717.

Richard, T. L. (2010). Challenges in scaling up biofuels infrastructure. Science, 329(5993), 793-796.

Riesenfeld, C. S., Schloss, P. D., & Handelsman, J. (2004). Metagenomics: genomic analysis of microbial communities. Annu. Rev. Genet., 38, 525-552.

Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1), 139-140.

Rodrigues, J. L., Pellizari, V. H., Mueller, R., Baek, K., Jesus, E. d. C., Paula, F. S., . . . Feigl, B. (2013). Conversion of the Amazon rainforest to agriculture results in biotic homogenization of soil bacterial communities. Proceedings of the National Academy of Sciences, 110(3), 988-993.

Rousk, J., Bååth, E., Brookes, P. C., Lauber, C. L., Lozupone, C., Caporaso, J. G., . . . Fierer, N. (2010). Soil bacterial and fungal communities across a pH gradient in an arable soil. The ISME journal, 4(10), 1340-1351.

Rushton, P. J., & Somssich, I. E. (1998). Transcriptional control of plant genes responsive to pathogens. Current opinion in plant biology, 1(4), 311-315.

Rönnberg-Wästljung, A., Glynn, C., & Weih, M. (2005). QTL analyses of drought tolerance and growth for a Salix dasyclados× Salix viminalis hybrid in contrasting water regimes. Theoretical and Applied Genetics, 110(3), 537-549.

Rönnberg‐Wästljung, A., Åhman, I., Glynn, C., & Widenfalk, O. (2006). Quantitative trait loci for resistance to herbivores in willow: field experiments with varying soils and climates. Entomologia experimentalis et applicata, 118(2), 163-174.

Samils, B., Rönnberg-Wästljung, A.-C., & Stenlid, J. (2011). QTL mapping of resistance to leaf rust in Salix. Tree genetics & genomes, 7(6), 1219-1235.

Santiago, R., Barros-Rios, J., & Malvar, R. A. (2013). Impact of cell wall composition on maize resistance to pests and diseases. International journal of molecular sciences, 14(4), 6960-6980.

Saxena, A., & Tilak, K. (1998). Free-living nitrogen fixers: Its role in crop production. Microbes for Health, Wealth and Sustainable Environment, Malhotra Publ Co, New Delhi. Edited by Verma AK, 25-64.

101

Schlaeppi, K., & Bulgarelli, D. (2015). The plant microbiome at work. Molecular Plant-Microbe Interactions, 28(3), 212-217.

Schutter, M., Sandeno, J., & Dick, R. (2001). Seasonal, soil type, and alternative management influences on microbial communities of vegetable cropping systems. Biology and Fertility of Soils, 34(6), 397-410.

Scott, S. A., Davey, M. P., Dennis, J. S., Horst, I., Howe, C. J., Lea-Smith, D. J., & Smith, A. G. (2010). Biodiesel from algae: challenges and prospects. Current opinion in biotechnology, 21(3), 277- 286.

Serapiglia, M. J., Gouker, F. E., & Smart, L. B. (2014). Early selection of novel triploid hybrids of shrub willow with improved biomass yield relative to diploids. BMC plant biology, 14(1), 74.

Serdyuk, O. P., Smolygina, L. D., Ivanova, E. P., Firsov, A. P., & Pogrebnoi, P. V. (2000). 4- Hydroxyphenethyl alcohol—a new cytokinin-like substance isolated from phototrophic bacterium Rhodospirillum rubrum. Exhibition of activity on plants and transformed mammalian cells. Process Biochemistry, 36(5), 475-479.

Shield, I., Macalpine, W., Hanley, S., & Karp, A. (2015). Breeding willow for short rotation coppice energy cropping. In Industrial Crops (pp. 67-80): Springer.

Singh, S., Pancholy, A., Jindal, S., & Pathak, R. (2014). Effect of co-inoculations of native PGPR with nitrogen fixing bacteria on seedling traits in Prosopis cineraria. Journal of environmental biology, 35(5), 929.

Skvortsov, A. (1968). Willows of the USSR. Мoscow: Nauka (in Russian).

Skvortsov, A. (1999). Willows of Russia and adjacent countries. Taxonomical and geographical revision. University of Joensuu, Faculty of Mathematics and Natural Sciences, Report Series# 39. Joensuu. English translation.

Smart, L., & Cameron, K. (2012). Shrub willow. C. Kole, S. Joshi S., and D. Shonnard [eds.], Handbook of bioenergy crop plants, 687-708.

Smart, L., Cameron, K., Volk, T., & Abrahamson, L. (2007). Breeding, selection and testing of shrub willow as a dedicated energy crop. Breeding, selection and testing of shrub willow as a dedicated energy crop.(19), 85-92.

Smart, L. B., & Cameron, K. D. (2008). Genetic improvement of willow (Salix spp.) as a dedicated bioenergy crop. In Genetic improvement of bioenergy crops (pp. 377-396): Springer.

Smart, L. B., Volk, T. A., Lin, J., Kopp, R. F., Phillips, I. S., Cameron, K. D., . . . Abrahamson, L. P. (2005). Genetic improvement of shrub willow (Salix spp.) crops for bioenergy and environmental applications in the United States. UNASYLVA-FAO-, 56(2), 51.

Song, J., & Bent, A. F. (2014). Microbial pathogens trigger host DNA double-strand breaks whose abundance is reduced by plant defense responses. PLoS pathogens, 10(4), e1004030.

102

Stanton, B. J., Serapiglia, M. J., & Smart, L. B. (2014a). The domestication and conservation of Populus and Salix genetic resources. Poplars and willows: trees for society and the environment. Wallingford, UK: CAB International, 124-199.

Stanton, B. J., Serapiglia, M. J., & Smart, L. B. (2014b). The domestication and conservation of Populus and Salix genetic resources. Poplars and willows: trees for society and the environment. Wallingford, UK: CAB International, 124-199.

Stoof, C. R., Richards, B. K., Woodbury, P. B., Fabio, E. S., Brumbach, A. R., Cherney, J., . . . Hornesky, J. (2015). Untapped potential: opportunities and challenges for sustainable bioenergy production from marginal lands in the Northeast USA. BioEnergy Research, 8(2), 482-501.

Sørensen, J. (1997). The¤ rhizosphere as a habitat for soil microorganisms. In Modern soil microbiology (pp. 21-45): Marcel Dekker Incorporated.

Tani, A., Sahin, N., Fujitani, Y., Kato, A., Sato, K., & Kimbara, K. (2015). Methylobacterium species promoting rice and barley growth and interaction specificity revealed with whole-cell matrix- assisted laser desorption/ionization-time-of-flight mass spectrometry (MALDI-TOF/MS) analysis. PloS one, 10(6), e0129509.

Taiz, L., & Zeiger, E. (2010). Photosynthesis: the light reactions. Plant physiology, 5, 163-198.

Taylor, P., & Shields, E. (1995). Development of migrant source populations of the potato leafhopper (Homoptera: Cicadellidae). Environmental entomology, 24(5), 1115-1121.

Terrazas, R. A., Giles, C., Paterson, E., Robertson-Albertyn, S., Cesco, S., Mimmo, T., . . . Bulgarelli, D. (2016). Chapter One-Plant–Microbiota Interactions as a Driver of the Mineral Turnover in the Rhizosphere. Advances in applied microbiology, 95, 1-67.

Thimm, O., Bläsing, O., Gibon, Y., Nagel, A., Meyer, S., Krüger, P., . . . Stitt, M. (2004). mapman: a user‐driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. The Plant Journal, 37(6), 914-939.

Tilak, K., Ranganayaki, N., Pal, K., De, R., Saxena, A., Nautiyal, C. S., . . . Johri, B. (2005). Diversity of plant growth and soil health supporting bacteria. Current science, 136-150.

Tilman, D., Cassman, K. G., Matson, P. A., Naylor, R., & Polasky, S. (2002). Agricultural sustainability and intensive production practices. Nature, 418(6898), 671-677.

Tokala, R. K., Strap, J. L., Jung, C. M., Crawford, D. L., Salove, M. H., Deobald, L. A., . . . Morra, M. (2002). Novel plant-microbe rhizosphere interaction involving Streptomyces lydicus WYEC108 and the pea plant (Pisum sativum). Applied and environmental microbiology, 68(5), 2161-2171.

Tolbert, V., & Schiller, A. (1996). Environmental enhancement using short-rotation woody crops and perennial grasses as alternative agricultural crops. Retrieved from

Tolbert, V. R., & Wright, L. L. (1998). Environmental enhancement of US biomass crop technologies: research results to date. Biomass and Bioenergy, 15(1), 93-100.

103

Trivedi, P., Schenk, P. M., Wallenstein, M. D., & Singh, B. K. (2017). Tiny Microbes, Big Yields: enhancing food crop production with biological solutions. Microbial Biotechnology, 10(5), 999- 1003.

Tsarouhas, V., Gullberg, U., & Lagercrantz, U. (2003). Mapping of quantitative trait loci controlling timing of bud flush in Salix. Hereditas, 138(3), 172-178.

Tsarouhas, V., Gullberg, U., & Lagercrantz, U. (2004). Mapping of quantitative trait loci (QTLs) affecting autumn freezing resistance and phenology in Salix. Theoretical and applied genetics, 108(7), 1335-1342.

Tsavkelova, E., Cherdyntseva, T., & Netrusov, A. (2005). Auxin production by bacteria associated with orchid roots. Microbiology, 74(1), 46-53.

Tsavkelova, E. A., Cherdyntseva, T. A., Klimova, S. Y., Shestakov, A. I., Botina, S. G., & Netrusov, A. I. (2007). Orchid-associated bacteria produce indole-3-acetic acid, promote seed germination, and increase their microbial yield in response to exogenous auxin. Archives of Microbiology, 188(6), 655-664.

Turner, S. D. (2014). qqman: an R package for visualizing GWAS results using QQ and manhattan plots. BioRxiv, 005165.

Tuskan, G., DiFazio, S., & Teichmann, T. (2004). Poplar genomics is getting popular: the impact of the poplar genome project on tree research. Plant Biology, 7(01), 2-4.

Tuskan, G. A., Difazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., . . . Salamov, A. (2006a). The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). science, 313(5793), 1596-1604.

Tuskan, G. A., Difazio, S., Jansson, S., Bohlmann, J., Grigoriev, I., Hellsten, U., . . . Salamov, A. (2006b). The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). science, 313(5793), 1596-1604.

“U.S. Energy Information Administration - EIA - Independent Statistics and Analysis.” Monthly Energy Review - Energy Information Administration, U.S. Energy Information Administration (EIA), Nov. 2017, www.eia.gov/mer.

Volk, T., Abrahamson, L., Nowak, C., Smart, L., Tharakan, P., & White, E. (2006a). The development of short-rotation willow in the northeastern United States for bioenergy and bioproducts, agroforestry and phytoremediation. Biomass and Bioenergy, 30(8), 715-727.

Volk, T., Abrahamson, L., Nowak, C., Smart, L., Tharakan, P., & White, E. (2006b). The development of short-rotation willow in the northeastern United States for bioenergy and bioproducts, agroforestry and phytoremediation. Biomass and Bioenergy, 30(8), 715-727.

Von Mark, V. C., & Dierig, D. A. (2014). Industrial crops: breeding for bioenergy and bioproducts (Vol. 9): Springer.

104

Vukmir-Delic, D., Lugic, Z., Radin, D., Knezevic-Vukcevic, J., & Simic, D. (1994). Presence and density of root nodulating Rhizobium meliloti bacteria in different soil types of the Krusevac region. Acta Biologica Iugoslavica, Series B. Mikrobiologijca, 31(2), 117-128.

Walley, J. W., Kliebenstein, D. J., Bostock, R. M., & Dehesh, K. (2013). Fatty acids and early detection of pathogens. Current opinion in plant biology, 16(4), 520-526.

Wardle, D. A. (2006). The influence of biotic interactions on soil biodiversity. Ecology letters, 9(7), 870- 886.

Weber, K. (1907). Aufbau und vegetation der Moore Norddeutschlands.

The Web Soil Survey database, version 3.3, produced by The National Cooperative Soil Survey, operated by the USDA Natural Resources Conservation Service, https://websoilsurvey.nrcs.usda.gov/app/, accessed March 14, 2018.

Weih, M., Rönnberg‐Wästljung, A. C., & Glynn, C. (2006). Genetic basis of phenotypic correlations among growth traits in hybrid willow (Salix dasyclados× S. viminalis) grown under two water regimes. New Phytologist, 170(3), 467-477.

Wilke, A., Harrison, T., Wilkening, J., Field, D., Glass, E. M., Kyrpides, N., . . . Meyer, F. (2012). The M5nr: a novel non-redundant database containing protein sequences and annotations from multiple sources and associated tools. BMC bioinformatics, 13(1), 141.

Wojnar, Z., Banach, T., & Hirschberger, A. M. (2010). Renewable fuels roadmap and sustainable biomass feedstock supply for New York. Forest, 4(8.2), 8.2.

Wong, W.-T., Tseng, C.-H., Hsu, S.-H., Lur, H.-S., Mo, C.-W., Huang, C.-N., . . . Liu, C.-T. (2014). Promoting effects of a single Rhodopseudomonas palustris inoculant on plant growth by Brassica rapa chinensis under low fertilizer input. Microbes and environments, 29(3), 303-313.

Wrage, N., Velthof, G., Van Beusichem, M., & Oenema, O. (2001). Role of nitrifier denitrification in the production of nitrous oxide. Soil biology and Biochemistry, 33(12), 1723-1732.

Xin, G., Glawe, D., & Doty, S. L. (2009). Characterization of three endophytic, indole-3-acetic acid- producing yeasts occurring in Populus trees. mycological research, 113(9), 973-980.

Xin, G., Zhang, G., Kang, J. W., Staley, J. T., & Doty, S. L. (2009). A diazotrophic, indole-3-acetic acid- producing endophyte from wild cottonwood. Biology and Fertility of Soils, 45(6), 669-674.

Yang, S., Zhang, X., Yue, J.-X., Tian, D., & Chen, J.-Q. (2008). Recent duplications dominate NBS- encoding gene expansion in two woody species. Molecular Genetics and Genomics, 280(3), 187- 198.

Yazaki, K. (2005). Transporters of secondary metabolites. Current opinion in plant biology, 8(3), 301- 307.

105

Zakrzewski, M., Proietti, C., Ellis, J. J., Hasan, S., Brion, M.-J., Berger, B., & Krause, L. (2016). Calypso: a user-friendly web-server for mining and visualizing microbiome–environment interactions. Bioinformatics, 33(5), 782-783.

Zhao, Q.-Y., Gratten, J., Restuadi, R., & Li, X. (2016). Mapping and differential expression analysis from short-read RNA-Seq data in model organisms. Quantitative Biology, 4(1), 22-35.

Zhou, C. L. E., & Backus, E. A. (1999). Phloem injury and repair following potato leafhopper feeding on alfalfa stems. Canadian journal of botany, 77(4), 537-547.

Zhou, R., Carlson, C. H., Gouker, F. E., Rodgers-Melnick, E., Tang, H., Evans, L. M., Town, C. D., Krishnakumar, V., Schmutz, J., Tuskan, G. A. (2017). Sex determination in Salix purpurea (submitted).

106

Wanyan Wang EDUCATION The Pennsylvania State University, Huck Life Sciences Institute, PA Aug. 2011 - present PhD candidate in Plant Biology Program Advisor: John E. Carlson

China Agricultural University, College of Agriculture and Biotechnology, China Sept. 2007 - July. 2011 GPA 3.82/4.0; Ranking: 5/122 Bachelor of Agronomy, major in Plant Protection

AWARDS and SCHOLARSHIP o Penn State Huck Travel Stipend Award of International Plant & Animal Genome XXV Conference Jan. 2017 o The Braddock Scholarship & The FEGR Award at Penn State Aug. 2011 o First Prize of Excellent Undergraduate Scholarship(16/314), China Agricultural University Dec. 2010 o Best Student Scholarship(6/314), China Agricultural University Dec. 2009 o National Merit Scholarship(8/314), Ministry of Education of the People's Republic of China Nov. 2008 o Top Prize of Excellent Undergraduate Scholarship(3/314), China Agricultural University Nov. 2008

RESEARCH EXPERIENCE Research assistant, The Pennsylvania State University Aug. 2011 - present Doctor of Philosophy (Ph.D.), Huck Life Sciences Institute, State College, PA Advisor: Dr. John E. Carlson o RNA-Seq Analysis on variation of willow cultivars’ response to herbivore potato leafhopper, identified resistance genes and pathways; o Metagenomics study on soil microbial communities and their functional attributes of willow Rhizospheric region, pinpointed factors that impact on microbial structure and identified promising plant-growth-promoting rhizobacteria.

National College Student Innovative Research Project Oct. 2009- May. 2011 China Agricultural University, Beijing Mentor: Dr. Wenxian Sun o Studied on the PhoPQ two-component signal transduction system to the virulence of rice leaf streak pathogen Xanthomonas oryzae. o Chief principal, responsible for applying project fund, control the process of the experiment, reporting and discussing progress regularly to my mentor and taking charge of solving problems during the process of experiments.

The Summer Intern Training Program Jul - Aug. 2010 National Institute of Biological Sciences, Beijing Mentor: Dr. Jianmin Zhou o Project Title: “Detail the Signal Transduction Pathway in Plant Innate Immune Response in Arabidopsis thaliana”; o Established an Arabidopsis protoplast transient expression system based on flg22 inducible signaling pathways.

The Undergraduate Thesis Project Nov. 2010 - June. 2011 National Institute of Biological Sciences, Beijing Mentor: Dr. Yan Guo o Project Title: “Plant ATPase-AHA2’s Regulation on Salt-resistance in Type Plant Arabidopsis thaliana”

PUBLICATIONS AND PRESENTATIONS o Wang W, Carlson J. E., Carlson C. H., Smart L. B., Transcriptome Analysis of Contrasting Resistance to Herbivory by Empoasca fabae in Two Shrub Willow Genotypes and their progeny. To be submitted. o Wang W, Carlson J. E., Eric F. S., Smart L. B., Comparative Metagenomics Reveals the Effects of Geography and Host Genotype on Willow Rhizosphere Microbial Community. To be submitted. o Wang W, Carlson J. E., Carlson C. H., Smart L. B., Transcriptome Analysis of Resistance of Shrub Willow to Empoasca fabae, 33rd Mid Atlantic Plant Molecular Biology Society Annual Conference, College Park, MD (Conference Talk) o Wang W, Carlson J. E., Eric F. S., Smart L. B., Plant-Microbe Communication in the Shrub Willow Rhizosphere: Microbiome Structure, Function and Crop Yield. International Plant & Animal Genome XXVI Conference, San Diego, CA (Conference Talk).