University of Vermont ScholarWorks @ UVM
Graduate College Dissertations and Theses Dissertations and Theses
2019
Natural Selection For Disease Resistance In Hybrid Poplars Targets Stomatal Patterning Traits And Regulatory Genes.
Karl Christian Fetter University of Vermont
Follow this and additional works at: https://scholarworks.uvm.edu/graddis
Part of the Plant Pathology Commons
Recommended Citation Fetter, Karl Christian, "Natural Selection For Disease Resistance In Hybrid Poplars Targets Stomatal Patterning Traits And Regulatory Genes." (2019). Graduate College Dissertations and Theses. 1162. https://scholarworks.uvm.edu/graddis/1162
This Dissertation is brought to you for free and open access by the Dissertations and Theses at ScholarWorks @ UVM. It has been accepted for inclusion in Graduate College Dissertations and Theses by an authorized administrator of ScholarWorks @ UVM. For more information, please contact [email protected]. NATURAL SELECTION FOR DISEASE RESISTANCE IN HYBRID POPLARS TARGETS STOMATAL PATTERNING TRAITS AND REGULATORY GENES.
A Dissertation Presented
by Karl Christian Fetter to The Faculty of the Graduate College of The University of Vermont
In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Specializing in Plant Biology
October, 2019
Defense Date: August 19th,2019 Dissertation Examination Committee:
Stephen R. Keller, Ph.D., Advisor Melissa Pespeni, Ph.D., Chairperson Terrence Delaney, Ph.D. Charles Goodnight, Ph.D. Jeanne Harris, Ph.D. Cynthia J. Forehand, Ph.D., Dean of Graduate College Abstract
The evolution of disease resistance in plants occurs within a framework of interacting phenotypes, balancing natural selection for life-history traits along a continuum of fast-growing and poorly defended, or slow-growing and well-defended lifestyles. Plant populations connected by gene flow are physiologically limited to evolving along a single axis of the spectrum of the growth-defense trade-o , and strong local selection can purge phenotypic variance from a population or species, making it di cult to detect variation linked to the trade-o . Hybridization between two species that have evolved di erent growth-defense trade-o optima can reveal trade-o s hidden in either species by introducing phenotypic and genetic variance. Here, I investigated the phenotypic and genetic basis for variation of disease resistance in a set of naturally formed hybrid poplars. The focal species of this dissertation were the balsam poplar (Populus balsamifera), black balsam poplar (P. trichocarpa), narrowleaf cottonwood (P. angustifolia), and eastern cottonwood (P. deltoides). Vegetative cuttings of samples were collected from natural populations and clonally replicated in a common garden. Ecophysiology and stomata traits, and the severity of poplar leaf rust disease (Melampsora medusae) were collected. To overcome the methodological bottleneck of manually phenotyping stomata density for thousands of cuticle micrographs, I developed a publicly available tool to automatically identify and count stomata. To identify stomata, a deep con- volutional neural network was trained on over 4,000 cuticle images of over 700 plant species. The neural network had an accuracy of 94.2% when applied to new cuticle images and phenotyped hundreds of micrographs in a matter of minutes. To understand how disease severity, stomata, and ecophysiology traits changed as a result of hybridization, statistical models were fit that included the expected proportion of the genome from either parental species in a hybrid. These models in- dicated that the ratio of stomata on the upper surface of the leaf to the total number of stomata was strongly linked to disease, was highly heritable, and wass sensitive to hybridization. I further investigated the genomic basis of stomata-linked disease variation by performing an association genetic analysis that explicitly incorporated admixture. Positive selection in genes involved in guard cell regulation, immune sys- tem negative regulation, detoxification, lipid biosynthesis, and cell wall homeostasis were identified. Together, my dissertation incorporated advances in image-based phenotyping with evolutionary theory, directed at understanding how disease frequency changes when hybridization alters the genomes of a population.
i Citations
Material from this dissertation has been published in the following form:
Fetter, K. C., Gugger, P. F., and Keller, S. R.. (2017). Landscape Genomics of Angiosperm Trees: From Historic Roots to Discovering New Branches of Adaptive Evolution. In: Groover A., Cronk Q. (eds) Comparative and Evolutionary Genomics of Angiosperm Trees. Plant Genetics and Genomics: Crops and Models, vol 21. Springer, Cham.
Fetter, K. C., Eberhardt, S., Barclay, R. S., Wing, S. and Keller, S. R.., (2019). StomataCounter: a neural network for automatic stomata identification and counting, New Phytologist, vol 223, no. 3, 1671-1681.
ii You picked the right time to accept this diploma,
As the whole family is thankful for an August in Vermont.
And after sleepless nights and years in snow capped mountains,
The family can unite for the pride of progress.
Take the time to enjoy it.
For your gluttony for punishment awaits in the Canadian prairies.
CHURCH STREET - / /
iii Acknowledgements
The philosophiae doctor is an achievement one can obtain only with the guidance and encouragement of a cadre of professors, post-doctoral researchers, fellow gradu- ate students, and peers. I have been fortunate to find myself surrounded by people whose desire to increase and di use knowledge inspired me to do the same. The journey I took to write the words in this dissertation was characterized, or one could say beset, by learning to listen carefully, embracing doubt and questions, and devel- oping a healthy relationship with failure. Some of the traits required to complete the philosophiae doctor I brought with me to Vermont, but most were taught to me by those in my scientific network in Burlington. I want to thank all of you for being there for me, for challenging me, and for believing in me. In particular, I would like to thank my Vermont family: Dr. Stephen Keller, Brittany Verrico, Dr. Vikram Chhattre, Ethan Thibaut, Jamie Waterman, Dr. Kattia Palacio-Lopez, Susan Fawcett, Dr. Michael Sundue, Dr. Dominik Thom, Dr. Jeanne Harris, Dr. David Barrington, Dr. Mary Tierney, Dr. Deb Neher, Tom Weicht, Dr. Yolanda Chen, and Dr. Andrea Lini. Porky Reade, Sarah Goodrich, and Karyn McGovern were the best administrative sta I have ever worked with! The Smithsonian Institution was where the spark of intellectual curiosity inside me was turned into a welding torch. Unequivocally, I would not be the scientist I am today without Dr. Scott Wing, Dr. Carrol Hotton, Dr. Rich Barclay, Dr. Bill DiMichele, Dr. Nathan Jud, Dr. Jun Wen, and Dr. Michael Gates. I must also acknowledge the important contributions to my education by professors at the University of North Carolina: Dr. Alan Weakley, Dr. Pat Gensel, Dr. Paul Gabrielson, and Dr. Haven Wiley. Christiane Hawkins, Debbie Barbot, and Gordon
iv Warburton inspired me in the classroom and in the music room as a high school student, and I thank them for that. Finally, my family are the roots I have used for decades to stay up right. They have supported me in thousands of ways, inspired me, told me to keep going when the times were tough. There is no way to write how thankful I am to call as fam- ily Maureen Mallon, Harry Blair, Caitlin Brennan, Evan Brennan, Hudson Brennan, Poppy Brennan, Maegen Novak, Dr. David Novak, Reynolds Novak, Annie-Lee Wed- dle, Eileen Mallon, Charles Fetter, and Dr. Sydney Conrad.
Chaos, Horror, & Hospitality
v Contents
Citations ...... ii Dedication ...... iii Acknowledgements ...... iv List of Figures ...... xiii List of Tables ...... xv
1 Landscape genomics of angiosperm trees: from historic roots to dis- covering new branches of adaptive evolution 1 1.1 Introduction ...... 3 1.1.1 Historical Review ...... 6 1.1.2 Challenges and Solutions for Landscape Studies in Trees ... 19 1.1.3 Future Research Directions for Forest Tree Landscape Genomics 26 1.2 Conclusions ...... 35 1.3 Acknowledgements ...... 35 1.3.1 Author Contributions ...... 36 1.4 Tables ...... 37 1.5 Figures ...... 38
2 StomataCounter: a neural network for automatic stomata identifi- cation and counting. 67 2.1 Introduction ...... 69 2.2 Materials and Methods ...... 72 2.2.1 Biological Material ...... 72 2.2.2 Deep Convolutional Neural Network ...... 73 2.2.3 Statistical Analyses ...... 76 2.3 Results ...... 79 2.3.1 Stomata detection ...... 79 2.3.2 Classification accuracy ...... 81 2.4 Discussion ...... 82 2.5 Acknowledgements ...... 87 2.6 Data availability ...... 87 2.7 Author contributions ...... 88 2.8 Tables ...... 89 2.9 Figures ...... 92
3 Trade-o s and selection conflicts in hybrid poplars indicate the stom- atal ratio as an important trait regulating disease resistance. 105 3.1 Introduction ...... 107
vi 3.2 Materials & Methods ...... 110 3.2.1 Study system ...... 110 3.2.2 Materials ...... 111 3.2.3 Molecular analyses ...... 112 3.2.4 Trait measurements ...... 114 3.2.5 Quantitative genetic analyses ...... 117 3.3 Results ...... 120 3.3.1 Genetic ancestry and trait variation ...... 120 3.3.2 Selection trade-o s and conflicts ...... 123 3.4 Discussion ...... 124 3.5 Acknowledgements ...... 132 3.6 Data Accessibility ...... 132 3.7 Author Contributions ...... 132 3.8 Tables ...... 134 3.9 Figures ...... 137
4 Guard cell regulation, the plant immune system, and growth tran- scription factors are under selection and associated with fungal dis- ease variation in advanced generation P. balsamifera x trichocarpa hybrids. 159 4.1 Introduction ...... 162 4.2 Materials & Methods ...... 165 4.2.1 Plant collections, traits, and sequencing ...... 165 4.2.2 Admixture mapping ...... 166 4.2.3 Selection Scans ...... 168 4.2.4 Gene annotation and candidate gene filtering ...... 169 4.3 Results ...... 169 4.3.1 Global and local ancestry of hybrids ...... 169 4.3.2 Trait variation ...... 170 4.3.3 Admixture mapping ...... 171 4.3.4 Selection scans ...... 172 4.3.5 Candidate genes and local ancestry class phenotypic distributions172 4.4 Discussion ...... 173 4.4.1 Candidate genes ...... 175 4.4.2 Limitations of our study ...... 180 4.4.3 Conclusion ...... 181 4.5 Acknowledgements ...... 183 4.6 Data Accessibility ...... 183 4.7 Author contributions ...... 183 4.8 Tables ...... 184
vii 4.9 Figures ...... 192
viii List of Figures
1.1 Populations for landscape genomic studies are frequently sampled out of a context of spatially varying genomic structure and environmen- tal change. The genetic structure of P. balsamifera is represented from a sample from 412 reference SNPs genotyped by (Keller et al. 2010) with three demes observed: low diversity central and western demes, and a higher diversity eastern deme (a). Environmental gradi- ents abound in nature, and greenness onset is a classic example of a biological phenomenon, controlled by both photoperiod and tempera- ture, that changes across large geographic areas (b). Eliminating false positives in selection scans is always challenging; however, fewer false discoveries are observed from the tail of empirical p-value distributions derived from putatively neutral reference SNPs (c). This approach is demonstrated with the results of a BayeScan analysis (Foll and Gag- giotti, 2008) of 412 reference SNPs and 335 candidate SNPs from 27 flowering-time genes that were sampled from the localities in (a). Set- ting a threshold conditioned with empirical p-values identifies local se- lection in SNPs from GIGANTEA 5 (GI5 ), EARLY BOLTING IN SHORT DAYS (EBS), PHYTOCHROME-INTERACTING TRAN- SCRIPTION FACTOR 3 (PIF3 ), andCASEIN KINASE II BETA CHAIN 3 (CKB3.4 ). Allele frequency variation in four SNPs from GI5 are associated with latitude (d). GI5 is involved in light signaling to the circadian clock and local adaptation of these alleles may allow precise timing of phenology. Figures reproduced, with permission, from M. Fitzpatrick, (a); A. Elmore (b); S. R. Keller, (c, d)...... 39 1.2 Overlap of candidate genes for local adaptation inPopulus trichocarpa identified in selection scans by Geraldes et al. (2014), McKown et al. (2014)c, Holliday et al. (2016), and Evans et al. (2014). No overlap was detected by all studies, but six genes overlapped among three studies (underlined). Version 2.2 Gene annotations from Geraldes et al. (2014) were updated to 3.0 for comparisons...... 40
ix 2.1 Architecture of the deep convolutional neural network (DCNN) and classification tasks. Left: Training and testing procedure. First col- umn: Target patches were extracted and centered around human- labeled stomata center positions; distractor patches were extracted in all other regions. A binary image classification network was trained. Second column: The image classification network was applied fully convolutional to the test image to produce a prediction heatmap. On the thresholded heatmap, peaks were detected and counted...... 92 2.2 Sample sizes of images for the top 20 families represented in the train- ing (yellow) and test (blue) datasets (a). Examples of pre- and post- analysis images. A probability heatmap map is overlain onto the input image in the red channel. Detected stomata marked with circles with peak values given in green (b). Species in image pairs: Adiantum peru- vianum, Begonia loranthoides (top row); Chaemaerauthum gadacardii, Echeandia texensis (second row); Ginkgo biloba, Populus balsamifera (third row); Trillium luteum, Pilea libanensis (bottom row). Scale bars in row one and two equal 20 µm, scale bars in row three equal 25 µm, and scale bars in row four equal 50 µm...... 93 2.3 Results of human and automatic counts and precision for each dataset organized by collection source (a,d), taxonomic group (b), and mag- nification (c), and imaging method (e). Precision and image quality scores have non-linear relationships, and changes in variance correlate to image quality (g-i). Positive values of precision indicate undercount- ing of stomata relative to human counts, while negative values indicate overcounting. The dotted black line is the 1:1 line. See Table 2.2 for model summaries. Abbreviations: tEntropy, image entropy ;fMean, mean power-frequency tail distribution; fSTD, standard deviation of power-frequency tail distribution...... 94 2.4 Accuracy by training image count and the e ect of increasing the train- ing set size on classification accuracies. Since classification is a binary task, chance level is at 50%. Training image count “all” includes all 4,618 annotated training images...... 95
x 2.5 Samples that were mislabeled with high confidence by either the ma- chine or human. False negatives (left panel) from stomata that were detected by the human, but not by the machine. Image features typi- cally generating false negatives are blurry images, artifacts obstruction the stomata, low contrast between epidermal and guard cells, and very small scale stomata. False positives (right panel) are image features that are labeled by the machine as stomata, but not by the human counter, or are missed by the human counter (e.g. top and bottom right images in the panel). Errors typically generating false positives are image artifacts that superficially resemble stomata, particularly shapes mimicking interior guard cell structure...... 95 2.6 Accuracies for models trained on di erent training datasets (vertical), tested on di erent test datasets. Combined is a union of all training and test sets. For precision and recall values, see Supporting Information Fig. S2.3...... 96 S2.1 Stomata identification test results ...... 97 S2.2 Stomata identification in Arabidopsis cell-patterning mutants. .... 97 S2.3 Precision and recall of transferred training and testing datasets. ... 98
3.1 Summary of sample localities, phenotypes, and expected ancestry pro- portions for the trees under study. Sample locality map with the geo- graphic ranges of four species of poplar: P. balsamifera in blue, P. tri- chocarpa in green, P. angustifolia in orange, and P. deltoides in yellow (a). See Table S1 for population coordinates and samples sizes. The phylogenetic relationships of the four species under study are presented as a cladogram (b). The expected proportion of ancestry of hybrid genotypes was calculated from NewHybrid (Anderson and Thompson, 2002) estimates of filial generation (c). Histograms and box plots of three disease traits summarized by PC1 of a principal components anal- ysis (DPC1), relative growth rate (G), and stomatal ratio (SR). Tukey tests indicate significant di erences at –<0.05. PCA of stomatal patterning and ecophysiology trait data. The presence or absence of disease and the hybrid cross type is mapped onto the points, but was not used to decompose variance of the traits in principal component space (e). Disease presence/absence calls in e were made from the eigenvalues for DPC1, where PC1 eigenvalues greater than -0.8 were classified as diseased. Images of the uredospore (f) and disease sign of orange-yellow uredinia (g) of Melampsora medusae on an BxD hybrid. 138
xi 3.2 Results of partial least squares regression (PLS) of ancestry proportions and traits. B,D,T, and A represent vectors of the expected ancestry proportions for each individual estimated by NewHybrids (Table 3.2). The color ramp is the scalar product of variable loadings U’ and V’ from the X and Y matrices, respectively, indicating the correlation between pairs of vectors. Broad-sense heritability (H 2) estimates for each trait are given from data sets with hybrids present (black) and absent (red). 95% credible intervals indicated as whiskers. See text for PLS and H 2 estimation details and Table S3.2 for H2 estimates. ... 139 3.3 Backcrossing into P. balsamifera decreases disease (DPC1) (a) and the stomatal ratio (SR) (b). When accounting for the e ect of SR on disease, residual variance decreases from F1 to Fn generations and un- admixed P. balsamifera (c). Growth is uniform across the filial genera- tions (d). Di erences in Tukey’s post-hoc tests for significance (p-value < 0.05) indicated by letter...... 140 3.4 MANOVA regression coe cient heat map...... 141 3.5 Regression of stomatal ratio onto principal components 1 and 2 of disease severity and growth. Regression coe cients and significance indicated in table...... 142 S3.1 Distribution of clonal values. See Table 3.1 for trait abbreviations. .. 147 S3.2 Variation of traits by cross type. See Table 3.1 for trait abbreviations. 148 S3.3 Variation of traits by filial generation. See Table 3.1 for trait abbrevi- ations...... 149 S3.4 Permutation tests for significance of selection conflict products. The product of the regression coe cients for each pair of traits was esti- mated 1000 times with permuted data (grey distribution). The ob- served product is considered significant if its value is more extreme than the permuted distribution at –/2=0.025. See Table 3.4 for results.150
4.1 Map of collection localities in western North America with the range of P. balsamifera and P. trichocarpa colored in blue and green, respec- tively (a). Ranges from (Little, 1971). Traits distributions fall into three categories: approximately normal, left-skewed, and zero-inflated distributions (b). Global ancestry estimates from locus-specific ances- tries estimated by RASPberry (c). Local ancestry for each individual (in rows) and for every site (in columns) colored as gray if the locus is homozygous for the P. balsamifera allele, blue for heterozygous loci, and green for homozygous for the P. trichocrapa allele (d)...... 192
xii 4.2 Summary of logistic model and trait variation. Forest plot of standard- ized regression coe cients from a logistic model with disease presence as the response. Blue and red points are positive and negative odds ratios, respectively, and red line is an odds ratio of 1 (a). See Table S4.1 for logistic model output. Predicted values of disease presence for stomata ratio (SR) (b) and global ancestry (c). Biplot of SR and global ancestry (d). Global ancestry of 1 indicates an unadmixed P. balsamifera individual...... 193 4.3 Output of BMIX models represented in manhattan plots. All traits are plotted in the same space in a. Disease and stomata traits grouped in b, ecophysiology traits in c, stomata traits in d. Red points in b, c, and d indicate significant loci at –=0.05/237.5...... 194 4.4 Intersection plot of significant genes from BMIX results...... 195 4.5 Plot of µ statistics for three sets of individuals by chromosome. The overlap of BMIX candidate genes and the top 1% of µ statistic outliers are indicated by the solid and dashed boxes for the disease + stomata gene set, and disease only gene set, respectively. The density distribu- tion of µ-statistics is presented by set, and the vertical colored ticks on x-axis indicate the top 1% of values for each set...... 196 4.6 Patterns of stomata ratio (SR), disease severity (D1’), and relative growth rate (G) by local ancestry for genomic regions identified to be associated with stomata patterning and disease traits, in the top 1% of RAiSD selection windows, polymorphic, and with phenotypic dis- tributions indicative of true positive results. Local ancestry genotype 0 indicates site is homozygous for P. balsamfiera ancestry, and 1 in- dicates the site is heterozygous with P. trichocarpa contributing the other copy of the locus...... 197 S4.1 Global ancestry estimated from ADMIXTURE (Alexander et al., 2009) at K = 2 (top) and K = 7 (bottom)...... 200 S4.2 Individual manhattan plots of p-values from BMIX tests. See Table 4.2 for trait abbreviation definitions...... 201 S4.3 Individual manhattan plots of p-values from BMIX tests. See Table 4.2 for trait abbreviation definitions...... 202 S4.4 Individual manhattan plots of p-values from BMIX tests. See Table 4.2 for trait abbreviation definitions...... 203
xiii List of Tables
1.1 Annotations for six genes implicated in local adaptation from range- wide samples of Populus trichocarpa by four separate studies. (M) McKown et al. (2014c), (G) Geraldes et al. (2014), (H) Holliday et al. (2016), (E) Evans et al. (2014). Annotations retrieved from www.popgenie.org...... 37
2.1 Description of the datasets used for training and testing the network. Abbreviations: DIC, Di erential interference contrast; SEM, scanning electron microscopy...... 89 2.2 Summary of linear model fit parameters in Figure 2.3 for di erent test datasets. Dataset definitions given in text. Abbreviations: y-intercept, a; standard error, SE. 200x images removed from this USNM/USBG † set. Significance indicated by: ú: P<0.05; úú: P<0.01; úúú: P<0.001. 90 2.3 Root mean square error (RMSE) of the model residuals. Lower RMSE values suggest a better fit of the model. The response in each model was precision. Abbreviations: fMean, mean power-frequency tail dis- tribution; fSTD, standard deviation of power-frequency tail distribution. 91
3.1 Trait definitions, abbreviations, and the mean (µ) standard deviation ± (‡) of traits with hybrids present and absent...... 134 3.2 NewHybrids results. The Ref. non-Pb (0) and Ref. Pb (1) columns refer to the size of the reference populations from which segregating loci were selected. The remaning columns describe the sizes of the filial generations in unknown samples. Abbreviations: Ref., reference population; Pb., Populus balsamifera...... 134 3.3 Output of MANOVA with three disease severity traits jointly modelled as the response. Significance indicated as *** P < 0.001; ** P < 0.01; * P < 0.05; . < 0.1...... 135
xiv 3.4 Trade-o s and selection conflicts inferred from linear and multiple re- gression models of three traits: D, disease resistance; G, relative growth rate; and SR, stomatal ratio. Negative regression coe cients (—n)or path analysis (RSR,G) indicate a trade-o s is present. The path anal- ysis sums each independent path of the e ect of SR on G: path 1, —2; path 2, — — . Selection conflicts are inferred when the product of 3 ú 1 regression coe cients are negative (schluter1991conflicting). The hybrids absent set includes only unadmixed P. balsamifera, and the Tacamahaca and Aigeiros data sets add hybrid individuals. —1 was taken from model 1. Significance testing for selection conflicts was performed with permutation tests (Fig. S3.4). Significance indicated as *** P < 0.001; ** P < 0.01; * P < 0.05...... 136 S3.1 Summary of populations, samples sizes and geographic coordinates. . 144 S3.2 Broad-sense heritability estimates and credible intervals for the hybrids present and absent datasets...... 145 S3.3 Model output for trade-o regressions. Significance indicated as *** P < 0.001; ** P < 0.01; * P < 0.05...... 146
4.1 Population locality and sample size summaries for individuals used in admixture mapping. ref-Pb. and ref-Pt. are abbreviations for Populus balsamifera and P. trichocarpa that formed the reference sets. .... 185 4.2 Trait definitions and abbreviations...... 186 4.3 Summary of genes associated to stomatal patterning and disease traits also under positive selection. The number of genes found in each chro- mosomal region under selection is given beneath each chromosome. . 186 4.4 Average allelic e ects for substituting a P. balsamifera ancestry site (0) for P. trichocarpa (1). Phenotypes were scaled and standardized before calculating the allelic e ect. The allelic e ects for each site in the chromosomal region under consideration was identical. The phenotypic distributions for these sites are in Fig. 4.6...... 187 4.5 Descriptions of candidate genes. Arabidopsis orthologs were identified by the best BLAST hit from www.popgenie.org or through BLAST re- sults from The Arabidopsis Information Resource (www.arabidopsis.org). See text for details regarding candidate gene selection...... 188 S4.1 Output of logistic model of disease presence and traits...... 198 S4.2 Summary of populations used in the RAiSD selection scan analysis. Pb. is an abbreviation for Populus balsamifera...... 199
xv Chapter 1
Landscape genomics of angiosperm trees: from historic roots to dis- covering new branches of adap- tive evolution
Karl C. Fettera,1, Paul F. Guggerb, and Stephen R. Kellera aDepartment of Plant Biology, University of Vermont bAppalachian Laboratory, University of Maryland Center for Environmental Science 1Corresponding author: Karl Fetter (e-mail: [email protected], tel: +1 802 656 2930)
1 Abstract
Landscape genomic studies analyze spatial patterns of genetic variation to test hy- potheses about how demographic history, gene flow, and natural selection have shaped populations. For decades, angiosperm trees have served as outstanding model sys- tems for landscape-scale genetic studies due to their extensive geographic ranges, large e ective population sizes, abundant genetic diversity, and high gene flow. These char- acteristics were recognized early in the landscape genetics literature, and studies on angiosperm trees, particularly Populus and Quercus, tested hypotheses about how landscape features shaped neutral patterns of gene flow and population divergence. More recently, advances in sequencing and analysis methodologies have allowed for greater opportunities to directly test how natural selection acting locally across the landscape has shaped the genome-wide diversity of populations, often in the context of broad climatic gradients in growing season length. Despite, the methodological gains and successes of the last decade, landscape genomics studies face new challenges of study design, hypothesis testing, and validation. Here, we explore the development of landscape genetics and genomics in angiosperm trees and what we have learned from investigating the evolutionary consequences of life as a tree in heterogeneous landscapes. We outline the past, present, and potential future of landscape genomic studies in angiosperm trees, highlighting successes of the field, challenges to overcome, and ideas that scientists from all backgrounds engaged in landscape genomics should consider. 1.1 Introduction
The biological characteristics of some organisms lend themselves almost perfectly to the fields of study that have developed around them: Drosophila and population ge- netics; Caenorhabditis elegans and development; maize and plant architecture. Trees are, at first glance, unlikely model organisms, especially given their longevity and long generation time (Taylor, 2002); however, as models for ecological and landscape genomics, they combine a unique set of features (Petit and Hampe, 2006). Trees have large e ective population sizes (Petit and Hampe, 2006; Evans, Allan, et al., 2015), disperse genes widely through pollen and seed over long distances (Nathan et al., 2002), produce millions of progeny that encounter strong selection (Augspurger, 1984), can clonally propagate and persist for millennia (Ally et al., 2008), extensively hybridize with congeners (Moran et al., 2012), and have large lifetime exposures to pathogens and herbivores that can change annually (Floate et al., 2016). These varied characteristics result in complex evolutionary processes that play out in landscapes as small as a few hectares (Strei , 1999), to entire biomes (Keller, Olson, et al., 2010) and continents (Callahan et al., 2013), making trees excellent candidates for the study of how ecological selection and landscape features structure genetic diversity. There is a rich history of ecological genetic studies in forest trees beginning with provenance trials of the 18th century (Langlet, 1971). Foresters would collect range- wide samples of seeds or cuttings from diverse populations and propagate these into common gardens, often replicated in di erent climatic regions of the landscape, with the objective of assessing how genetics and environment influence growth and yield, especially patterns of vegetative bud phenology and the timing of adaptive seasonal
3 dormancy (Savolainen et al., 2007). While these e orts began well before the age of genomics, researchers were already recognizing the importance of climate, disease, and other sources of selection in structuring forest tree adaptive diversity. With the rise of the genomic era, ecological geneticists working on trees have been increasingly in a position to evaluate the genetic architecture of local adaptation, identify regions of the genome associated with climate, and accelerate the selection and breeding process of tree domestication.
The expansion of genomic resources in poplars (Populus, Tuskan et al., 2006), Eucalyptus (Myburg et al., 2014), oaks (Quercus , Plomion et al., 2016), and other forest-trees (Liang et al., 2011; Fang et al., 2013) was critically important for opening new avenues of research in ecological and landscape genomics, and in many ways, studies on trees drove this field forward. Populus emerged as the model for tree bi- ology in the 2000’s due to the ease of clonal propagation, manageable genome size (~480Mb), ecological diversity, extensive hybridization, the availability of genomic re- sources (such as its transformation by Agrobacterium), and its candidacy as a source of cellulosic bioethanol production (DiFazio, Leonardi, et al., 2012). The accumula- tion of population genetic and genomic studies in Populus has resulted in large gains in our understanding of the genetic architecture of phenology (Keller, N Levsen, Ol- son, et al., 2012; Olson et al., 2013; McKown, Guy, Quamme, et al., 2014; McKown, Guy, Klápöt , et al., 2014; McKown, Klápöt , et al., 2014), wood and biomass growth (Porth, Klapöte, et al., 2013), disease resistance (Porth, Klápöt , et al., 2015;Foster et al., 2015), and many other physiological processes. The Populus community is global and continues to grow in the fields of biomass production (Zhang et al., 2015), phytoremediation (Di Lonardo et al., 2011), the plant microbiome (Hacquard and
4 Schadt, 2015), and of course, among other fields, landscape genomics (Evans, GT Slavov, et al., 2014; Geraldes, Farzaneh, et al., 2014; Porth, Klápöt , et al., 2015).
More recently, Quercus sp. are emerging as models for the study of ecological and landscape genomics (Petit, Carlson, et al., 2013). Consisting of over 400 species, oaks are dominant in many Northern Hemisphere forests, and represent a large carbon sink in terrestrial landscapes (Myneni et al., 2001). Many oaks are foundational species that maintain biodiversity by providing habitat and food for numerous plants and animals (Block et al., 1990; Burns and Honkala, 1990; Fralish, 2004; McShea et al., 2007). They are also of high economic value for their lumber (Luppold and Bumgardner, 2008; Espinoza et al., 2011) and ecosystem services that include carbon sequestration, riparian bu ering and water quality enhancement (Dosskey et al., 1997; Kroeger et al., 2010), improvement of hunting and rangelands (Standiford and Howitt, 1993; Kroeger et al., 2010), improved nutrient cycling (Dahlgren et al., 1997; Herman et al., 2003), and cultural significance (Pavlik et al., 1991; Anderson, 2013). In this chapter, we explore the development of landscape genetics and genomics in angiosperm trees and what we have learned from investigating the evolutionary consequences of life as a tree in a heterogeneous landscape. We focus especially on
Populus and Quercus,which have made the greatest strides to date among landscape studies on angiosperm trees. We then highlight several areas where researchers in tree landscape genomics face challenges, or historically have been narrowly focused. Lastly, we o er recommendations for investigating overlooked aspects of tree biology and evolution in landscape genomics studies, and suggest new inquires for future research directions.
5 1.1.1 Historical Review
Landscape genetics: gene flow and population structure
Before the mid-2000’s and the onset of the next generation sequencing (NGS) revolu- tion, landscape genetic studies focused primarily on how genetic diversity of anony- mous markers were shaped by neutral demographic processes, including the e ects of gene flow, clonality, isolation-by-distance, range expansion, and population structure (Manel, Schwartz, et al., 2003). Polymorphic loci were rarely associated with adaptive signatures in allele frequency driven by environmental agents of selection (Gonzalez- Martinez et al., 2006), a fact largely due to the type of molecular markers employed (e.g. amplified-length polymorphisms, AFLPs; random amplified polymorphic DNA, RAPDs; and simple sequence repeats, SSRs/microsatellites) and the typically un- known location of these markers in a genome. Nevertheless, these loci were su cient to answer questions regarding demographic processes shaping genetic diversity. Early landscape genetic studies in angiosperm trees focused on both contemporary gene dispersal (VL Sork and Smouse, 2006) as well as larger-scale phylogeographic structure (Petit, Brewer, et al., 2002). Pollen and seed di er by the distances they are dispersed, their dispersal vectors, the genomes they carry, and their ploidy level; traits that profoundly influence spatial patterns of diversity (McCauley, 1995; Petit and Hampe, 2006). Elegant early studies of landscape genetics in white oaks by Petit and colleagues laid the foundation for our understanding of how long-distance seed dispersal creates legacies of genetic structure during range expansion (Le Corre et al., 1997; Petit, Pineau, et al., 1997). In Quercus robur and Q. petraea, Petit et al. (1997) discovered homogenous patches of chloroplast (cpDNA) haplotypes
6 on the landscape, suggesting that rare long-distance dispersal events followed by local population expansion, rather than a continuous wave of migration, structures landscape diversity in cpDNA. Finer-scale understanding of pollen and seed movement came from Sork and colleagues who demonstrated how landscape configuration and land use history shaped the e ective number of pollen donors (Nep). Fragmentation and habitat loss lowered Nep in wind-pollinated California valley oaks (Q. lobata) (VL Sork, Davis, et al., 2002), while fragmentation actually facilitated gene flow in the insect-pollinated flowering dogwood (Cornus florida) (VL Sork, Smouse, et al., 2005). Studies in Q. robur and Q. petraea demonstrated that pollen-mediated gene flow occurring over long distances (>200 m) is common, accounting for 67% of matings
(Strei , 1999). Studies in Populus have also informed our understanding of gene flow distances. Dispersal of exotic or transgenic alleles is a concern in hybrid Populus plantations. Carefully designed gene flow studies demonstrated that most pollination and seed establishment occurs near the source tree (within 450 m) (DiFazio, Leonardi, et al., 2012), and pollination occurs less frequently when native tree populations are large. Establishment of exotic or transgenic genes in populations of native species is more likely when native population sizes are small. Removing native female trees near exotic or transgenic plantations may be an e ective means of eliminating gene flow in managed landscapes (Meirmans et al., 2010).
Some angiosperm trees (especially in Populus ) are known for extensive clonal reproduction that can strongly influence local genetic structure (Mock, Rowe, et al., 2008), and once established, a set of clones can dominate an area for long periods
(Imbert and Lefévre, 2003). Early studies in black poplar (Populus nigra) assessed the extent of clonal diversity remaining among two remnant populations along the Rhine
7 River in the Netherlands (Arens et al., 1998). Using SSRs, they identified 13 distinct clones among the 71 ramets sampled, suggesting extensive clonality. Recent studies of the highly clonal quaking aspen (P. tremuloides) in the western U.S. also used SSRs to identify and map the spatial extent of clones on the landscape (Mock, Rowe, et al., 2008). Mock, Callahan, et al. (2012) was also able to document the widespread existence of triploid aspen clones in western U.S. landscapes, which represent a unique source of genetic diversity relative to the more recently colonized populations in the glaciated northern regions of the continent (Callahan et al., 2013). The molecular estimates of clone age in P. tremuloides , ranging from 175 to 10,000 years (Ally et al., 2008), suggest that once established, Populus clones may shape the genetic diversity of landscapes for millennia. At larger spatial and temporal scales, the Pliocene and Pleistocene glacial cycles had a substantial impact on the genetic diversity of many North American (VL Sork and Smouse, 2006) and European trees (Hewitt, 1999) through waves of range ex- pansion and contraction, although not all species experienced population size changes (Grivet et al., 2006). Post-glacial migration rates from pollen cores predicted fast tree migration rates out of refugia (100-1000 m/yr), but estimates from genetic data are contradictory (<100 m/yr) (McLachlan et al., 2005; Feurdean et al., 2013), and the increasing recognition of high latitude glacial refugia may refine our estimates of tree migration rates downward (Petit, Hu, et al., 2008; Breen et al., 2012). Studies in trees revealed population size changes and genetic isolation in allopatric refugia create the conditions for speciation (Sorrie and AS Weakley, 2001), as well as opportunities for adaptive introgression upon secondary contact (Bragg et al., 2015; Suarez-Gonzalez et al., 2016). For example, range shifts associated with Pleistocene climate change
8 have been implicated in speciation between Populus trichocarpa and P. balsamifera (ND Levsen et al., 2012), as well as driving adaptive introgression during admixture of divergent European populations of P. tremula (De Carvalho et al., 2010). The development of genomic resources in Populus (Tuskan et al., 2006) allowed for genotyping large numbers of individuals at hundreds to thousands of physically mapped loci (Ingvarsson, 2008; Keller, Olson, et al., 2010; GT Slavov et al., 2012). Applications of coalescent demographic modeling and Bayesian clustering to pop- ulation genomic datasets revealed a general pattern of strong latitudinal gradients of population structure and geographic gradients in genetic diversity (Keller, Olson, et al., 2010; GT Slavov et al., 2012). For example, in P. angustifolia, STRUC- TURE analyses revealed southern, mid-latitude, and northern demes (Evans, Allan, et al., 2015). These confounding e ects of population structure (Fig. 4.1a) introduce challenges for finding adapted loci under environmental selection from variables that covary with latitude (Fig. 4.1b). Hence, understanding the demographic history of a sample became essential for any association genetic study using natural diversity.
Candidate gene studies of local adaptation
Heritable phenotypic clines in phenology, growth, and other physiological traits are widespread in nature (Haldane, 1948; Ingvarsson, García, et al., 2006; Hall, Luquez, et al., 2007; Kempes et al., 2011), suggesting an intimate relationship between adap- tive evolution and the landscape (Linhart and Grant, 1996). In temperate and bo- real forests trees, locally adapted clines of bud phenology (spring bud flush and late summer bud set) allow trees to coordinate their growth and dormancy with the avail- able growing season, and achieve cold hardiness before low temperatures or drought
9 become damaging (Ingvarsson, García, et al., 2006). These phenotypic clines are the hallmark signatures of adaptation to climate, exhibiting strong heritability and genetically-based population di erentiation, these traits are attractive targets for in- vestigating the genetic basis of local adaptation (Savolainen et al., 2007).
The increased knowledge of gene function from model plants, especially Arabidop- sis, coupled with the development of growing genomic resources in emerging an- giosperm model trees (e.g. Populus, Quercus, Eucalyptus, and Castanea;(Neale and Kremer, 2011) facilitated a wave of local adaptation studies in tree bud phenology primarily focused on candidate genes from the flowering time and circadian clock as- sociated pathways. Early e orts to associate adaptive clines in bud phenology with candidate genes were conducted in P. tremula by Ingvarsson and colleagues. PHYB2 senses red to far-red light and regulates several important traits in Arabidopsis, in- cluding hypocotyl elongation, meristem development, and flowering-time (Sheehan et al., 2007). It was hypothesized that changes in allele frequency in PHYB2 would associate with clines in bud set driven by latitudinal variation in critical photope- riod. Latitudinal clines in PHYB2 SNPs were detected, and, consistent with local adaptation, paralleled bud set clines; however, analytical methods were not capable of disentangling whether this was due to local selection or neutral processes (Ingvarsson,
García, et al., 2006). Follow-up studies including SNPs flanking the PHYB2 region found allele frequencies did not deviate from neutral expectations, even though two nonsynonymous SNPs in PHYB2 explained 1.5 - 5% of clinal variation in bud set (Ingvarsson, Garcia, et al., 2008). Hall, Luquez, et al. (2007) leveraged the release of the P. trichocarpa genome (Tuskan et al. 2006) to design 18 neutral SSRs (7 near candidate genes) and sequenced 50 SNPs from five candidate genes (PHYB2,
10 COL2B, GA20OX1, ABI1B, and HYPO312 ) for phenology responses. While phe- nology traits showed strong quantitative genetic di erentiation among populations
(QST ), candidate SNPs and SSRs near candidate genes did not show elevated FST , nor did the genetic data covary strongly with latitude (Hall, Luquez, et al., 2007).
Ma et al. (2010) broadened the focus to include SNPs from 25 genes across the P. tremula photoperiod pathway. Using FST outlier tests for local adaptation, they were unable to di erentiate adaptive from neutral patterns of population di erentiation, even though several candidate SNPs were associated with phenotypes and showed al- lele frequency clines with latitude. Using alternative tests for positive selection based on comparing synonymous vs. nonsynonymous mutations (MK test) and intraspe- cific polymorphism vs. interspecific divergence (HKA test), Hall, MA, et al. (2011) did find evidence for adaptive evolution in several photoperiod and circadian clock associated genes, including PHYB2. These early studies in aspen revealed a complex reality that the genetic basis of local adaptation would be di cult to unravel for highly polygenic traits, and helped usher in two important realizations for the field of adaptive landscape genomics: (1) finding genes involved in local adaptation requires testing candidate genes against samples of loci from across the genome using statistical models that account for the confounding e ects of demographic history (Platt, Vilhjálmsson, et al., 2010); and (2) deducing the genetic architecture of local adaptation would be a di cult road, as the average e ect of a locus on a polygenic trait is small (Rockman, 2012), and selection at the phenotypic level is distributed across many QTLs in the genome (Ti n and Ross- Ibarra, 2014). An enduring lesson for landscape genomics thus became limiting false discoveries while searching for local adaptation by developing, testing, and applying
11 methods that explicitly modeled the demographic history of the sample (Lotterhos and Whitlock, 2014). These new analytical methods included tests for clinal gene- environment associations (GEA), supported by programs such as Bayenv (Günther and Coop, 2013) and latent-factor mixed models (LFMM) (Frichot et al., 2013), as well as FST outlier programs like BayeScan (Foll and Gaggiotti, 2008) that detected diversifying selection against the background of neutral population structure.
In a study of local adaptation for phenology in P. balsamifera, Keller, N Levsen, Olson, et al. (2012) used 412 putatively neutral reference SNPs to model e ects of demographic history and tested for local adaptation in 27 candidate genes in the
flowering-time pathway (Fig. 4.1c, d). FST outlier tests and GEA convergently identified 11 candidate SNPs that had signatures of excessive population structure or association with climate. Interestingly, phytochromes (e.g. PHYA, PHYB1, PHYB2 ) showed no signs of local adaptation in P. balsamifera despite their signature of adap- tation along a latitudinal gradient in the related P. tremula (Ingvarsson, García, et al., 2006; Ma et al., 2010), nor did phytochromes show evidence of species-wide positive selection, as in P. tremula (Hall, MA, et al., 2011; Keller, N Levsen, Ingvars- son, et al., 2011). Instead, local selection in P. balsamifera was primarily targeted at the circadian clock controlled gene GIGANTEA (GI ) (Oliverio et al., 2007), a master regulator of multiple developmental processes such as germination, flowering, and osmotic stress (Mishra and Panigrahi, 2015). GI is known to mediate day length signaling from the phytochromes to regulate expression of the genes CONSTANS and FLOWERING LOCUS T involved in flowering and vegetative phenology in Populus (Böhlenius et al., 2006). Thus, an important comparative insight from these candi- date gene studies was that convergent phenotypic clines in related species may result
12 from selection targeting the same overall functional pathway (e.g. light-regulated circadian responses) but involving di erent genes within the pathway. Indeed, this trend also appears to be true for the evolution of sex-determination in Populus section Populus and Populus section Tacamahaca despite the ancient origin of dioecy in the genus (Geraldes, Hefer, et al., 2015).
More recently, Menon et al. (2015) took advantage of P. balsamifera’s extensive distribution across boreal and arctic latitudes to study natural variation in adaptation to freeze tolerance. Menon et al. combined physiological assays of freeze tolerance using electrolyte leakage tests with gene expression and population genetic analyses on candidate genes in the poplar C-repeat binding factor (CBF) gene family. Their findings indicated that northern genotypes were physiologically more cold tolerant than southern genotypes, and that CBF gene expression was strongly induced by cold in balsam poplar. One gene (CBF2 ) also showed population genetic evidence of selection in the form of an excess of derived alleles, elevated FST , and clinal allele frequency variation. While transcriptional regulation and sequence polymorphisms in
CBF genes were clearly associated with cold adaptation, CBF candidate genes only explain a small fraction of cold-induced changes in gene expression, suggesting the involvement of additional pathways.
Outside of Populus, investigations of the genomics of local adaptation in Quercus and other Fagaceae were underway by the mid-2000’s. Foundational work study- ing QTLs and gene expression of various tissues identified a number of candidate genes involved in landscape variation in bud phenology and drought response (Porth, Koch, et al., 2005; Casasoli et al., 2006; Derory, Léger, et al., 2006; Derory, Scotti- Saintagne, et al., 2010; Soler et al., 2008). These studies spurred a growing body
13 of work in Quercus, and provided an important resource for comparison to Populus and other taxa. In the European oak (Q. petrea), FJ Alberto et al. (2013) integrated
FST -outlier, GEA, and genotype-phenotype association tests based on SNPs from 106 phenology-related candidate genes to test for molecular signatures of local adap- tation along altitudinal and latitudinal gradients. While 34 genes were significant in at least one of the tests, only 4 were identified in more than one: endochitinase class I, seed maturation protein, ribosomal protein L18a, and plastocyanin A. Some candidate genes were significant in both altitudinal and latitudinal gradients, sug- gesting convergent selection at di erent spatial scales (ribosomal protein L18a, GI, Photosystem II polypeptide, 5’-adenylylsulfate reductase, ribosomal protein S11). VL Sork, Squire, et al. (2016) also started with candidate genes from other studies and combined multiple tests for signatures of local adaptation in Q. lobata in Califor- nia, including FST -outlier and univariate and multivariate environmental association tests. Five of seven genes with significant SNPs were significant in more than one test: auxin-repressed protein and –-amylase/subtilisin inhibition, which are involved in bud phenology; and heat shock protein 17.4 and elongation factor 1-–, which are involved in temperature stress response. The FST values of outliers in Q. lobata are substantially higher than those in Q. petraea, suggesting that di erences in popula- tion history and climate heterogeneity may lead to very di erent degrees and patterns of local adaptation. Furthermore, several candidate genes for cold hardiness in Pseu- dotsuga menziesii (Krutovsky and Neale, 2005) and bud phenology in European oaks (Casasoli et al., 2006; Derory, Scotti-Saintagne, et al., 2010) also show evidence of local adaptation in the Californian oak, Q. lobata (VL Sork, Squire, et al., 2016). For the most part, however, candidate genes identified across a limited number of stud-
14 ies were di erent, leaving it unclear how common convergent adaptation is among species and between angiosperm clades (Martins et al., 2018; FJ Alberto et al., 2013; VL Sork, Squire, et al., 2016; Gugger, Cokus, et al., 2016).
The recent publication of the Eucalyptus grandis genome (Myburg et al., 2014) promises to be a useful tool to study adaptive evolution in the species rich and ecologically diverse Eucalyptus. For example, E. pauciflora occurs from sea-level to tree-line and seems like a particularly well-suited species for studying adaptation to temperature, cold hardiness, and drought stress. Adaptation studies in Eucalyptus are in the nascent stages (Dillon et al., 2015), but some traits, e.g. adaptation to aridity (Steane et al., 2014), are better understood. A recent candidate gene study of local adaptation in E. camaldulensis identified associations between climate (especially temperature and evaporative demand) and two genes – ERECTA (an LRR receptor- like kinase) and PIP2 (an aquaporin), both of which also showed excess population di erentiation in FST outlier tests (Dillon et al., 2015). Additional work in E. tricarpa has uncovered extensive evidence of phenotypic plasticity and local adaptation for ecophysiological traits related to drought avoidance, although no associations with molecular data were sought (McLean et al., 2014). We eagerly await further results from landscape genomic studies in Eucalyptus that expand on our current knowledge of candidate genes involved in local adaptation to environmental gradients, especially drought and other forms of osmotic stress, and facilitate comparison with the existing body of work in northern hemisphere forest trees.
15 Genome-wide studies of local adaptation
While candidate gene studies show promise for understanding local adaptation on the landscape, they are still biased by the choice of genes to investigate. The development of new SNP arrays (Geraldes, Difazio, et al., 2013) allowed for genotyping tens of thousands of loci across the genome, while whole genome, exome, or transcriptome re-sequencing studies enabled testing millions of SNPs and identification of locally adapted genomic regions without any a priori specification of the number or types of genes involved (Evans, Allan, et al., 2015; Holliday et al., 2016). To date, genome- wide association studies (GWAS) have received the most attention in P. trichocarpa (McKown, Guy, Klápöt , et al., 2014; McKown, Klápöt , et al., 2014; McKown, Guy, Quamme, et al., 2014; Zhou et al., 2014; Porth, Klapöte, et al., 2013; Evans, GT Slavov, et al., 2014; Holliday et al., 2016). Loci associated with adaptation to climate (Evans, GT Slavov, et al., 2014), aridity (Steane et al., 2014), adaptation across latitude and elevational gradients (Holliday et al., 2016), and associations with adaptive ecophysiological traits (McKown, Guy, Klápöt , et al., 2014) have been discovered. It is interesting to consider the repeatability of the genomic regions implicated by local adaptation scans in P. trichocarpa, given that they have involved di erent research labs, independent germplasm collections across a similar landscape gradi- ent, and di erent statistical approaches to testing for adaptation. We compared local adaptation outliers from four studies in P. trichocarpa based on a range-wide sampling: two studies (McKown, Klápöt , et al., 2014; Geraldes, Farzaneh, et al., 2014) generated genomic data from genes discovered in the Geraldes, Difazio, et al. (2013) SNP array, while Holliday et al. (2016) sequenced whole exomes and Evans,
16 GT Slavov, et al. (2014) sequenced whole genomes. Gene annotations from Geraldes,
Farzaneh, et al. (2014) were updated from the P. trichocarpa genome version 2.2 to 3.0. Our comparisons were based on unique P. trichocarpa gene IDs identified as lo- cally adapted in each study. These include 579 genes from Geraldes et al. (2014), 276 from McKown, Klápöt , et al. (2014c), 1160 from Holliday et al. (2016), and 452 from Evans et al. (2015) (Fig. 1.2). No gene was identified as a selection outlier by all four studies, however six genes were identified as outliers in common across combinations of three studies (Table 1.1). Two of these shared outliers were transcription factors, while a third was a methyltransferase, possibly indicating an important role for tran- scriptional and epigenetic regulation of gene expression in local adaptation. One gene, Potri.010G165700, a MYB-like DNA binding protein, showed upregulated expression in dormant and pre-chilling vegetative and floral buds (www.popgenie.org), a profile supportive of this gene’s association with bud phenology in GWAS (McKown et al. 2014c). New candidates for local adaptation have also been found using whole-transcriptome sequences (ignoring expression) from natural populations to associate coding SNPs with environmental variation on the landscape. Using this approach in 22 Q. lobata individuals produced 220,427 SNPs for analysis (Gugger, Cokus, et al., 2016). Of these, 79 SNPs from 50 genes were significantly associated with climate gradients, especially minimum temperature and precipitation variables, and most of these genes were di erentially expressed after drought stress (Gugger, Peñaloza-Ramírez, et al., 2017), suggesting they are likely involved in local adaptation. A few SNPs are in genes with known roles in response to stimulus or stress (e.g., SAUR-like auxin-responsive protein family), cold shock protein binding (zinc knuckle family protein), light re-
17 sponse or photosynthesis (e.g., cryptochrome 1), and trichome development (Myosin family protein with Dil domain) in Arabidopsis (Swarbreck et al., 2007). Furthermore, the 50 candidate genes have elevated levels of genetic diversity, consistent with bal- ancing selection or other mechanisms favoring di erent alleles in di erent contexts.
In a similar population RNASeq study in Populus balsamifera, transcriptome-wide gene expression was compared between northern and southern genotypes grown un- der common garden conditions (Wang et al., 2014). A total of 4,018 of 32,466 genes showed di erential expression between northern and southern genotypes. Genes that were more highly expressed in southern genotypes included homologs of the Ara- bidopsis flowering time pathway (EARLY FLOWERING 3, FLOWERING LOCUS C, PHYA and CAULIFLOWER). However, unlike in the Quercus study of Gugger, S Fitz-Gibbon, et al. (2016b), the di erentially expressed genes did not harbor SNPs that showed signatures of local selection (based on F ST or GEA outlier analyses), sug- gesting that expression divergence is largely uncoupled from nucleotide divergence in
P. balsamifera. While the underlying goal of these studies is to identify and understand patterns of adaptive genomic evolution, they also simultaneously address fundamental ques- tions of how genomic diversity is shaped in geographic space. For example, Geraldes et al. (2014), found that diversifying selection along a latitudinal gradient princi- pally was enriched for phenology and photoperiod related genes. At the same time, they addressed a classic landscape genetic question about geographic isolation along riparian zones by demonstrating that populations in connected drainages are more genetically similar than between drainages. Similarly, Holliday et al. (2016) identified genes associated with climatic di erences in growing season length by testing whether
18 adaptation involves selection targeting the same genomic regions between latitudinal and altitudinal gradients in P. trichocarpa. Their finding of significant overlap in the outlier regions associated with latitude and altitude provided rare support for the hypothesis of parallel adaptation along similar environmental gradients in di erent geographic regions of the species range. Merging the unique biology of angiosperm trees with the landscape genetics/ge- nomics program has generated a deep understanding for the connection between land- scapes, biodiversity, and the evolutionary processes of gene flow, genetic drift, and natural selection. The initial goals of the field, as outlined by Manel, Schwartz, et al. (2003), continue to be relevant and guide research today. Despite analytical hur- dles that still exist, applications of new sequencing technologies combined with more refined analysis methods are generating exciting discoveries.
1.1.2 Challenges and Solutions for Landscape
Studies in Trees
Many landscape genomic studies have borrowed sampling schemes from phylogeogra- phy, the study of population lineages across large (e.g. continental) landscapes (Fig. 4.1a). This approach has yielded much information about how genetic and pheno- typic variation aligns along environmental gradients (e.g. photoperiod, temperature, and precipitation) that vary across large geographic regions. However, range-wide sampling can produce methodological challenges, false positive associations due to uncontrolled population structure, and biases in our expectations of which genomic regions are locally adapted. In this section, we review some of the challenges currently
19 facing landscape genomics and highlight potential solutions.
Collinear environmental, phenotypic, and genotypic axes
The biological characteristics of trees that make them good candidates for landscape and ecological genomic studies also generate challenges for studies of natural variation that require clever study designs and/or novel methodologies to solve. The collinear variation of environmental, phenotypic, and genomic variation across landscapes is a major challenge to overcome (Fig. 4.1a, b). For many temperate and boreal species, the structure of neutral loci following population expansion from glacial refugia mim- ics the signal of adaptive selection searched for by association methods (Lotterhos and Whitlock, 2014). Disentangling false positives arising from demographic history against true positives arising from selection creates a problem for treating landscape samples as ‘natural experiments’. Algorithms accounting for demography in a sample of loci have been developed to reduce the false-discovery rate (FDR) due to population structure, while maintaining power to avoid false negatives (WY Yang et al., 2012; Günther and Coop, 2013; Fri- chot et al., 2013; Berg and Coop, 2014). However, deciding where to place a cut-o value for significance tests remains di cult, given the very large number of tests in most population genomic datasets. Recently, it was recommended that researchers create empirical p-values from test statistic distributions derived from putatively neu- tral loci (Lotterhos and Whitlock, 2014). While it is challenging to know what kind of locus is truly neutral (e.g. the promoter for TB1, a gene in the QTL that reorga- nizes teosinte-like to maize-like branching, is 41 kb upstream of the gene; Clark et al.,
2006), selecting intergenic loci and conditioning p-values of candidate loci from them
20 should decrease the FDR markedly (Lotterhos and Whitlock, 2014). Additionally, sampling individuals from regions of the landscape that have experienced the same demographic history and have low FST can greatly minimize false positives generated by demography, particularly if covariation of the ecological and environmental vari- ables of interest is present. Thus, future studies on the genomics of local adaptation will benefit from careful consideration of sampling design and generation of appropri- ate null expectations, both in terms of individuals on the landscape and SNPs across the genome.
Validating adaptive associations discovered in natural populations
Local adaptation studies are generating many associations increasing our knowledge of gene functions (McKown, Klápöt , et al., 2014); however, attempts to validate these loci using independent evidence is still limited (∆aliÊet al., 2016). For the most part, the candidate genes identified across even the most robust studies are di erent.
Our comparison of four local adaptation studies in P. trichocarpa (McKown, Klápöt , et al., 2014; Evans, GT Slavov, et al., 2014; Geraldes, Farzaneh, et al., 2014; Holliday et al., 2016) using broadly overlapping genomic data sets find few loci in common (Fig. 1.2). What portion of the many associations not shared between studies result from false positives, di erences in the biology of the samples, or methodological ap- proaches employed? This brings into focus the growing need for functional validation of adaptation outliers. Functional validation of genes identified from GWAS can be di cult, however, when successful, the results are compelling (e.g. Meijón et al., 2014). Genetic ap- proaches to gene validation in trees is possible (Kim et al., 2011), and the advent
21 of CRISPR/Cas9 make editing putatively adaptive and non-adaptive alleles of can- didate genes even more feasible (Fan et al., 2015; Bono et al., 2015). Thus, we are optimistic that validation of the most promising candidates using transgenic methods will be an increasing trend in the future. Another validation approach for local adaptation studies is to replicate the associ- ation, for example by testing the phenotypic e ect of the candidate locus segregating within a pedigreed mapping family (Stinchcombe and Hoekstra, 2008). Trees typi- cally have long generation times and waiting upwards of 4-8 years (for example in
Populus, DiFazio, GT Slavov, et al., 2011) for selected lines to reach reproductive maturity is not uncommon, and producing advanced generation mapping families is di cult. Nevertheless, concerted e orts have produced a handful of advanced genera- tion pedigrees for linkage trait mapping in angiosperm trees in Populus (Taylor, 2002) and several members of the Fagaceae, notably Quercus, Fagus, and Castanea (Kremer et al., 2012). Long-term mapping populations comprised of hundreds of progeny are now available for Q. alba at the University of Tennessee, Knoxville and University of Missouri Center for Agroforesty (http://hardwoodgenomics.org/), as well as for Q. robur in Europe (Bodénès et al., 2012; Plomion et al., 2016). However, researchers remain limited when established mapping families do not segregate for the alleles or traits of interest. Nature, however, has provided additional solutions to some of these problems, particularly in taxa that extensively hybridize with congeners, allowing for strategic admixture mapping from natural hybrid zones.
In these zones, naturally occurring F1s can be selected to produce F2 or backcross recombinant generations. In Populus, a well known tri-hybrid zone exists in southern Alberta between P. balsamifera, P. deltoides, and P. angustifolia (Floate, 2004).
22 Aided by diagnostic SNP genotyping arrays that identify Populus species and their hybrids (Isabel et al., 2013), hybrid F1s as well as every combination of backcrossed
F2s can be found there (Floate et al., 2016). Hybrid zones between P. balsamifera and P. trichocarpa are also well known (Geraldes, Farzaneh, et al., 2014), and in Wyoming and Colorado another tri-hybrid zone has been observed between P. balsamifera, P. trichocarpa, and P. angustifolia (Chhatre et al., 2018). Using the diverse array of interspecific hybrids as parents in targeted crosses has the potential to accelerate the time to production of advanced generation mapping families.
Investment in phenotyping
The exponential decline of genotyping costs (Manel, Perrier, et al., 2016) has shifted the data collection bottleneck in ecological and landscape genomic studies towards phenotyping. Phenotyping traits representative of the diversity of developmental stages and tissue types can be di cult for trees, given their large size, delayed re- productive maturity, massive root systems, and the high cost of establishing and maintaining long-term common gardens. Increased attention is being devoted to de- veloping standardized methods for high-throughput phenotyping (Brown et al., 2014), and automated leaf morphology software developed for Populus (Bylesjö et al., 2008) has been successfully applied to other species (IvetiÊet al., 2014). Similarly, re- searchers are beginning to investigate the complexity and adaptive contribution of below-ground tissues (Madritch et al., 2014) and their symbiotic interactions (Hac- quard and Schadt, 2015). However, renewed attention needs to be paid to funding and maintaining common gardens, sharing and integrating data across gardens for purposes of meta-analysis, and innovating new methods and approaches to capturing
23 phenotypes from common gardens and natural landscapes. Common gardens play a critical role in landscape genomic studies and connect genotype to phenotype to environment (Porth, Klapöte, et al., 2013; Evans, GT Slavov, et al., 2014). Gardens with diverse landscape-scale sampling have been es- tablished in Populus, including extensive collections of P. trichocarpa by di erent research groups (448 individuals from McKown et al. 2014c, >1,000 individuals from Evans et al. 2015, and 391 individuals from Holliday et al. 2016), as well as the Ag-
CanBap collection of P. balsamifera (525 individuals; Soolanayakanahally, Guy, Silim, Drewes, et al., 2009; Soolanayakanahally, Guy, Silim, and Song, 2013), the Swedish
Aspen collection (SwAsp) of P. tremula (116 individuals; Luquez et al., 2008), col- lections of P. fremontii, P. angustifolia, and their hybrids (hundreds of individuals; Holeski et al., 2013), and the Chinese P. tomentosa collection (460 individuals; Du et al., 2014). In oaks, a comprehensive provenance trial for Q. lobata was established in 2013 at two sites from seeds sourced from 95 locations throughout its distribution (Delfino-Mix et al., 2015). Funding cycles, however, are too short to support the long-term maintenance needed for tree studies, and diverse collections of provenance trials established in the past century have been abandoned and overlooked. Few trials are still maintained, but interest is rebounding in taking advantage of these historical resources. For example, a mature provenance trial of approximately 300 trees from
30 populations sampled across the range of the tuliptree (Liriodendron tulipifera)was planted in the mid-1980’s in Chapel Hill, NC, but was orphaned for much of its ex- istence (Fetter and A Weakley, 2019). Older, but smaller, trials are also available for
Q. alba , including one comprised of 23-year-old trees from 17 populations planted in three locations in Indiana (Huang et al., 2015).
24 While the challenge of establishing and maintaining common gardens will likely remain in landscape genomics, we need to start thinking more strategically about how to use and combine these resources, and where applicable, study tree pheno- types in natura. Aggregating trait data from many common gardens would prove useful for meta-analyses of adaptive phenotypic trends across spatial scales, times, and even closely related species. Comparative work of this type has begun in Pop- ulus (Soolanayakanahally, Guy, Street, et al., 2015), but more awaits. Finding new opportunities to collaborate and integrate data across research labs hosting common gardens is needed, and tree biologists should target funding mechanisms to build research coordination networks around common garden resources. Landscape genomics would also benefit from testing the transferability of genotype- phenotype associations from common gardens to the field. The challenge of landscape phenomics will define the next generation of studies that seek to close the genotype- phenotype-environment triangle of adaptive association. The hurdles, while logisti- cally challenging, are largely technological, and progress in this area is essential to make meaningful applications of tree management and conservation of genetic re- sources under environmental change (Pettorelli et al., 2014). Exciting advances in remote sensing are now being applied to phenotyping forest trees at scales of clonal stands to entire landscapes (Homolova et al., 2013). For example, remote sensing of seasonal phenology is being deployed at fine (Toomey et al., 2015) to continental (Graham et al., 2010) scales, and satellite-based phenology products such as MODIS provide coarse landscape-scale phenotyping of phenology that correlates with ground- based measurements in Populus (Elmore et al., 2016). This opens doors to connecting landscape variation in forest phenology with adaptive gradients in allele frequencies
25 in genomic regions associated with phenology under controlled experimental condi- tions. At finer scales of individual trees, measurement of several key phenotypes (e.g. spring bud flush and plant water balance) may be possible through use of field de- ployed sensors that detect di erences in tree mass using acceleration or reflectance using light-emitting detectors (e.g., Kleinknecht et al., 2015). While still in need of further development, landscape-scale phenotyping holds the potential to connect adaptive genomic variation to forest productivity in natural stands, and is a challenge that future studies will surely begin to address.
1.1.3 Future Research Directions for Forest Tree
Landscape Genomics
Refinements in genomic sequencing, association methods accounting for demography, and inclusion of broader evolutionary questions has driven the steady increase of the impact of landscape genomic studies using angiosperm trees. We make the following recommendations and predictions for the future of the field:
Growing genomic resources outside of Populus
Genomic resources in Populus are now well-developed and easily accessible (e.g. www.popgenie.org), but are still lacking for many other angiosperm trees. Increasing genomic resources in other taxa will allow for more comparative studies that will increase the impact of what is known from Populus (see Sollars and Buggs, this vol- ume). Genomic resources in Quercus are steadily increasing and are most developed for the white oaks (Quercus section Quercus). Recently, the first draft oak genome,
26 based on a combination of Roche 454, Illumina, and Sanger sequences, was announced for the ecologically and economically important European white oak, Quercus robur (Plomion et al., 2016). The genome contains 18 k sca olds (>2 kb) totaling about 1.3 Gb, which is about 1.7 times the haploid genome size (Kremer et al., 2012). To enable research in ecological genomics, there are several important improvements underway. These include the development of a genome sequence with collapsed haplotypes to facilitate SNP calling, further assembly that utilizes linkage maps and additional se- quencing to determine gene order, and structural and functional annotation (Plomion et al., 2016) that will leverage abundant expressed sequence tags (EST) and RNA-Seq libraries from a variety of tissues (Ueno et al., 2010; Tarkka et al., 2013; Lesur et al., 2015).
A draft oak genome is now available for Q. lobata (VL Sork, ST Fitz-Gibbon, et al., 2016), a threatened western North American species with a population his- tory that stands in contrast to many studied trees (Grivet et al., 2006); entirely on Illumina sequencing, two complementary assemblies were produced. The first (v0.5) aggressively collapses haplotypes to optimize its use for SNP calling (Gugger, S Fitz- Gibbon, et al., 2016). The second (v1.0) retains haplotypes and is composed of 18.5 k sca olds (> 2 kb) with total length of 1.15 Gb (VL Sork, ST Fitz-Gibbon, et al.,
2016). This latter assembly is similar in both size and sequence to that of Q. robur , with 93% sequence identity and 97% of sca olds reciprocally aligning. Functional and structural annotations are derived from an annotated reference transcriptome assembly for Q. lobata (Cokus et al., 2015) in addition tode novo annotations. Other resources for eastern North American trees are available from the Hardwood Genomics Project (http://www.hardwoodgenomics.org/) and from the Fagaceae Ge-
27 nomics Web (http://www.fagaceae.org) for many important tree species, including oaks, chestnuts, sweet gum, beech, and tuliptrees, among others. These include EST libraries and transcriptome assemblies for Q. alba and Q. rubra that were generated from a variety of tissues and sequencing platforms, as well as a low-coverage genome assembly for Q. alba that was constructed for identifying SSRs. Early analyses sug- gest high similarity with other oaks and broad synteny within Fagaceae (Kremer et al., 2012; Bodénès et al., 2012; Cokus et al., 2015). Along with the development of sequence resources, there has been an explosion in the number of SNPs and SSRs available. Over one million SNPs were discovered in
California oaks, especially Q. lobata (Cokus et al., 2015; Gugger, Cokus, et al., 2016). Tens of thousands of SNPs are known in Q. robur (Ueno et al., 2010), and recently, a cross-species genotyping array with over 7k SNPs was developed for European oaks
(Lepoittevin et al., 2015). In addition, thousands of SSRs have been identified in Q. alba and Q. robur (Durand et al., 2010; Bodénès et al., 2012).
Incorporating gene expression studies
Advances in transcriptomics (JH Lee et al., 2015) and isoform detection (Bernard et al., 2015) are making it possible to ask new questions about the genomic mechanisms of local adaptation and the evolution of phenotypes. Integrating gene expression into selection scans can provide complementary information to SNP genotypes when con- ducting selection scans and GWAS. Experimental designs that analyze di erential expression dependent on both treatment and source population can produce excel- lent candidates for inferring local adaptation. In a small greenhouse experiment that exposed Q. lobata seedlings from three source populations to a drought treatment,
28 whole-transcriptome gene expression changes identified 56 genes that had significant interaction terms (Gugger, Peñaloza-Ramírez, et al., 2017). These genes are primar- ily involved in metabolic processes, stress response, transport/transfer of molecules, signaling, and transcription regulation. Given the prevalence of molecules involved in metabolism, it was hypothesized that local adaptation to drought in oaks may involve di erent abilities to metabolize or di erent ways of altering metabolic activities in the face of drought. Di erentially expressed genes or isoforms may experience di erent forms of se- lection (Hartfield, 2016). Some of these expression di erences may be linked to sex function in dioecious species or monoecious species with imperfect flowers (Geraldes, Hefer, et al., 2015). Analysis of di erential expression during the development of imperfect flowers in Betula revealed a transcription factor, BPMADS2, involved in specifying petal and stamen identity and is expressed only in developing male flowers (Järvinen et al., 2003). Local adaptation in the expression of this transcription factor may allow for precise timing of male flower development.
Testing the role of epigenetics in local adaptation
NGS is enabling research on the potential role of epigenetic mechanisms of local adaptation in trees. Cytosine methylation varies among individuals and populations (Schmitz et al., 2013). It can arise through random mutation (Becker, Hagmann, et al., 2011), be induced by the environment (Verhoeven et al., 2010), can be stably inherited (Becker and Weigel, 2012), and may be an additional source of variability for local adaptation. Early evidence in oaks shows that single methylation variants (SMVs) in CpG contexts are exceptionally di erentiated among populations (Platt,
29 Gugger, et al., 2015). In addition, CpG SMVs have strong associations with climate and specific variants show evidence of local adaptation to temperature stress (Gugger, Cokus, et al., 2016). Therefore, it is hypothesized that either CpG SMVs are markers for loci involved in local adaptation or they are under divergent natural selection themselves. Additional research is needed to better understand the mutation process, assess heritability, and link methylation to phenotypes.
Genomic predictions of climate change and local adaptation
Understanding the genomics of local adaptation to current environmental conditions allows us to ask how populations might respond when climates change (V Sork et al., 2013; Wang et al., 2014; Fitzpatrick and Keller, 2015). For example, research in
Arabidopsis thaliana suggests that the fitness of populations under novel conditions can be predicted from SNPs in genes involved in local adaptation (Hancock et al., 2011). 2011). Prediction of fitness from SNPs might be powerfully applied to tree genomics in studies that pair genotypes with growth traits from common garden trials. Furthermore, the future genetic composition of populations under climate change can be modeled, and the degree of mismatch between current and projected genomic composition can be considered in management planning (Fitzpatrick and Keller, 2015). Identifying adaptive genotypes under projected climate change may be critical for conservation of biodiversity and economically important genotypes, core arguments in favor of assisted migration (Aitken and Bemmels, 2016).
30 Assessing the prevalence of adaptive introgression
Oaks, poplars, and many other angiosperm trees are notorious hybridizers with large e ective population sizes readily capable of generating adaptive molecular variation. An important question for tree genomics is how species are able to maintain their ecological, morphological, and genomic identities in the face of widespread hybridiza- tion, and whether hybridization contributes actively to adaptation in hybrid zones and outside of them. Genomic tools o er unprecedented opportunities to investigate the genes relevant to speciation (Wu 2001; Bawa and Holliday, this volume), whether hybridization facilitates the transfer of beneficial alleles across species boundaries (Suarez-Gonzalez et al., 2016), the ecological factors that promote or restrict hy- bridization (Ortego et al., 2014), and how important these processes are for evolution in trees (Gailing 2014, Gailing and Curtu 2014). Evidence from European poplars suggests that advanced generation hybrids ac- cumulate genetic incompatibilities that not only decrease their fitness, but are fre- quently lethal (Christe et al., 2016). Nevertheless, unidirectional or biased introgres- sion of genes occurs frequently in angiosperm trees (Thompson et al., 2010; Geraldes, Farzaneh, et al., 2014) and fragments of nDNA can introgress across hybrid zones
(Martinsen et al., 2001). In P. tremula , adaptive introgression from Russian to Swedish populations may have contributed to adaptive shifts in the phenology of Swedish trees (De Carvalho et al., 2010). Similarly, the high rate of hybridization in oaks has long been a topic of interest (CH Muller, 1952), and has influenced the development of the ecological species concept (Van Valen, 1976). One promising area for future study is to investigate the role of introgression in responses to pathogens or other sources of plant stress (e.g., cold, drought, soil-metal
31 tolerance). What role does adaptive introgression play when interspecific populations meet in an environment where hybridization increases pathogen or herbivore pressures (Holeski et al., 2013; Busby et al., 2013)? Do genetic incompatibilities accumulate in hybridizing populations that limit the ability for adaptive immune responses to track the movement of pathogens/herbivores as they ’step’ across species boundaries
(Floate et al., 2016)? Some evidence from Populus protease inhibitor genes hints that introgression may accompany adaptive responses to herbivore enemies (Neiman et al., 2009). When multiple hybrid zones exist between the same set of species, does unidi- rectional gene flow create a ’sink species’, biasing the pattern of introgression, or does local selection play a dominant role regulating which genomic regions preferentially introgressed? Geraldes, Farzaneh, et al. (2014) hypothesize that hybridizing popula- tions of P. trichocarpa and P. balsamifera in Alaska may be adaptively introgressing cold tolerance or drought tolerance genes that would experience negative selection in the southern hybrid zone. Trees seem to be ideal systems studying the interaction between hybridization and natural selection.
Seeking out understudied systems and questions
The studies we have discussed o er initial insights into how tree genomes evolve in response to ecological selection pressures, but much work remains to thoroughly un- derstand the molecular basis of local adaptation. Much of the research to date has emphasized local adaptation with regard to bud phenology, growing season length, and other sources of abiotic stress. Biotic interactions and non-climate-related envi- ronmental adaptation have received far less attention in landscape studies, particu- larly in the area of pathogen responses, which is an area of critical importance for
32 human economies and culture (e.g. co ee trees and leaf rust, Avelino et al., 2012; the banana and Panama disease, Ordonez et al., 2015). Thus, additional research is needed to understand local adaptation in a broader array of contexts, especially those focused on co-evolution and reciprocal selection driven by species interactions (Whitham et al., 2008). Trees, as foundational species that structure biodiversity, are ideal subjects for community-level genomic studies. We must also expand our geographic and taxonomic focus to include understud- ied regions, environments, and taxa. A quick survey of the angiosperm (and non- angiosperm) tree landscape genetics literature reveals a strong geographical bias for boreal forest, and, increasingly, temperate forest trees. Although trees from these regions hold much of the Earth’s biomass, a broader taxonomic and geographic sam- pling of forest trees is needed to understand the diversity of adaptation in natural landscapes. Tropical forests harbor incredible ecological diversity (Gentry, 1988) and the evolutionary findings from boreal and temperate trees regarding the selective envi- ronments and the genomic pathways of adaptation in trees may be the exception, not the rule. Negative density dependent selection, principally from fungal pathogens (Augspurger, 1984), acts to keep tropical tree diversity high and abundance low (Bagchi et al., 2014). What e ect does negative density dependent selection from pests and pathogens have on patterns of adaptive genomic diversity in tropical trees? How is diversifying selection for genes involved in an immune response achieved in the face of limited gene flow and low abundance (Hamilton, 1999)? Are large e ec- tive population sizes su cient to maintain the supply of standing variation and new mutations required for trees to keep pace in the arms race against pathogens and herbivores? Are the spatial scales of local adaptation to pathogens small, moderate,
33 or large in tropical forests?
Revisiting the scale and scope of landscape genomic studies
Understanding the genetic determinants of locally adaptive clines is a long-standing goal of landscape genetics (Manel, Schwartz, et al., 2003). The large-scale, often range-wide sampling schemes used to investigate clines have become standard in the tree genomics literature. Multiple studies have discovered adaptive evolution of genes associated with phenology clines (Keller, N Levsen, Olson, et al., 2012; McKown, Klápöt , et al., 2014; Evans, GT Slavov, et al., 2014; Geraldes, Farzaneh, et al., 2014), and indeed, given the high broad-sense heritability of phenology traits (McK- own, Guy, Klápöt , et al., 2014) and their direct connection to fitness, one expects strong local selection. Additional ecological dimensions to local adaptation should receive greater focus, including di erent soil chemistries and biotic interactions that vary across landscapes. A modified scope and sampling design is needed to detect lo- cal adaptation that does not vary clinally. Designing studies to more precisely isolate di erent environmental gradients can help disassociate correlated predictors that are common in range-wide studies. For example, temperature and photoperiod both typi- cally covary with latitude, and a researcher interested in temperature adaptation may have trouble interpreting results from a sampling design oriented along a north-south gradient. However, a sample from di erent elevations along a gradient, while keeping latitude constant, will remove e ects of photoperiod. Similar designs have been em- ployed recently to detect parallel adaptation to climate across altitudinal transects in P. trichocarpa (Holliday et al., 2016). Smaller scale, targeted landscape genomic studies have much to o er and can unravel new dimensions of local adaptation.
34 1.2 Conclusions
The challenge of understanding genomic variation responsible for adaptive pheno- types on the landscape has been fully engaged for more than a decade by scientists working with angiosperm forest trees. Sequencing and methodological advances have enabled a deeper understanding of neutral and adaptive variation, but the field stands short of proving causative relationships between genotype, phenotype, and the envi- ronment. The breadth of evolutionary questions continues to grow, as does the tax- onomic sampling, and the field is expanding the availability of genomic resources for a broader array of economically and ecologically important forest trees. Identifying unique geographic hotspots of genomic diversity on landscapes, even before we fully understand their adaptive significance, is a valuable contribution to conservation of tree germplasm during climate change. However, careful consideration of research questions and the sampling design most likely to yield interpretable results is critical and will allow for more diverse studies across the many ecological dimensions of local adaptation. Angiosperm trees will remain important models for studies of adaptive evolution in the coming years, and the combination of landscape genomics with vali- dation methods have the potential to make significant contributions to conservation, sustainable development, and our understanding of the natural world.
1.3 Acknowledgements
Support for this work was provided by National Science Foundation (NSF) awards IOS-1461868 to Keller and IOS-1444611 to Gugger. In addition, we appreciate the
35 thoughtful comments of the editors, Andrew Groover and Quentin Cronk.
1.3.1 Author Contributions
KCF and SRK conceived the scope of the work. KCF performed the statistical analyses, wrote, and edited the manuscript. SRK provided extensive edits of the manuscript. PFG wrote the text specific to Quercus.
36 1.4 Tables
Table 1.1: Annotations for six genes implicated in local adaptation from range-wide samples of Populus trichocarpa by four separate studies. (M) McKown et al. (2014c), (G) Geraldes et al. (2014), (H) Holliday et al. (2016), (E) Evans et al. (2014). Annotations retrieved from www.popgenie.org.
Gene model Arabidopsis homolog Gene function Overlap Potri.010G212900 AT5G13560 NA M, H, E Potri.005G073000 AT5G65270.1 GTP binding G, M, E Potri.006G257000 NA membrane protein G, M, E Potri.010G165700 AT3G01140 transcription factor G, M, E Potri.006G225600 AT2G26200 methyltransferase G, M, H Potri.009G017400 AT2G35940.1 transcription factor HOX domain proteins G, M, H
37 1.5 Figures
38 Figure 1.1: Populations for landscape genomic studies are frequently sampled out of a con- text of spatially varying genomic structure and environmental change. The genetic struc- ture of P. balsamifera is represented from a sample from 412 reference SNPs genotyped by (Keller et al. 2010) with three demes observed: low diversity central and western demes, and a higher diversity eastern deme (a). Environmental gradients abound in nature, and green- ness onset is a classic example of a biological phenomenon, controlled by both photoperiod and temperature, that changes across large geographic areas (b). Eliminating false positives in selection scans is always challenging; however, fewer false discoveries are observed from the tail of empirical p-value distributions derived from putatively neutral reference SNPs (c). This approach is demonstrated with the results of a BayeScan analysis (Foll and Gaggiotti, 2008) of 412 reference SNPs and 335 candidate SNPs from 27 flowering-time genes that were sampled from the localities in (a). Setting a threshold conditioned with empirical p- values identifies local selection in SNPs from GIGANTEA 5 (GI5), EARLY BOLTING IN SHORT DAYS (EBS), PHYTOCHROME-INTERACTING TRANSCRIPTION FACTOR 3 (PIF3), andCASEIN KINASE II BETA CHAIN 3 (CKB3.4). Allele frequency variation in four SNPs from GI5 are associated with latitude (d). GI5 is involved in light signaling to the circadian clock and local adaptation of these alleles may allow precise timing of phe- nology. Figures reproduced, with permission, from M. Fitzpatrick, (a); A. Elmore (b); S. R. Keller, (c, d).
39 McKown Holliday
Geraldes 182 1116 Evans 4 80 19 465 414 2 1
0
18 4
0 3 11
Figure 1.2: Overlap of candidate genes for local adaptation inPopulus trichocarpa identified in selection scans by Geraldes et al. (2014), McKown et al. (2014)c, Holliday et al. (2016), and Evans et al. (2014). No overlap was detected by all studies, but six genes overlapped among three studies (underlined). Version 2.2 Gene annotations from Geraldes et al. (2014) were updated to 3.0 for comparisons.
40 References
Aitken, SN and JB Bemmels (2016). “Time to get moving: assisted gene flow of forest
trees”. In: Evolutionary Applications 9.1, pp. 271–290. Alberto, FJ, J Derory, C Boury, JM Frigerio, NE Zimmermann, and A Kremer (2013). “Imprints of natural selection along environmental gradients in phenology-related
genes of Quercus petraea”. In: Genetics 195.2, pp. 495–512. Ally, D, K Ritland, and S Otto (2008). “Can clone size serve as a proxy for clone
age? An exploration using microsatellite divergence in Populus tremuloides”. In: Molecular Ecology 17.22, pp. 4897–4911. Anderson, MK (2013). Tending the wild: Native American knowledge and the man- agement of California’s natural resources. University of California Press. Arens, P, H Coops, J Jansen, and B Vosman (1998). “Molecular genetic analysis of
black poplar (Populus nigra L.) along Dutch rivers”. In: Molecular Ecology 7.1, pp. 11–18. Augspurger, CK (1984). “Seedling survival of tropical tree species: interactions of
dispersal distance, light-gaps, and pathogens”. In: Ecology 65.6, pp. 1705–1712. Avelino, J, A Romero-Gurdián, HF Cruz-Cuellar, and FA Declerck (2012). “Landscape context and scale di erentially impact co ee leaf rust, co ee berry borer, and co ee
root-knot nematodes”. In: Ecological applications 22.2, pp. 584–596. Bagchi, R, RE Gallery, S Gripenberg, SJ Gurr, L Narayan, CE Addis, RP Freckleton, and OT Lewis (2014). “Pathogens and insect herbivores drive rainforest plant
diversity and composition”. In: Nature 506.7486, p. 85.
41 Becker, C, J Hagmann, J Müller, D Koenig, O Stegle, K Borgwardt, and D Weigel
(2011). “Spontaneous epigenetic variation in the Arabidopsis thaliana methylome”. In: Nature 480.7376, p. 245. Becker, C and D Weigel (2012). “Epigenetic variation: origin and transgenerational
inheritance”. In: Current opinion in plant biology 15.5, pp. 562–567. Berg, JJ and G Coop (2014). “A population genetic signal of polygenic adaptation”.
In: PLoS genetics 10.8, e1004412. Bernard, E, L Jacob, J Mairal, E Viara, and JP Vert (2015). “A convex formulation for joint RNA isoform detection and quantification from multiple RNA-seq samples”.
In: BMC bioinformatics 16.1, p. 262. Block, WM, ML Morrison, and J Verner (1990). “Wildlife and oak-woodland inter-
dependency”. In: Fremontia 18.3, pp. 72–76. Bodénès, C, E Chancerel, O Gailing, GG Vendramin, F Bagnoli, J Durand, PG Goicoechea, C Soliani, F Villani, C Mattioni, et al. (2012). “Comparative mapping
in the Fagaceae and beyond with EST-SSRs”. In: BMC plant biology 12.1, p. 153. Böhlenius, H, T Huang, L Charbonnel-Campaa, AM Brunner, S Jansson, SH Strauss, and O Nilsson (2006). “CO/FT regulatory module controls timing of flowering and
seasonal growth cessation in trees”. In: Science 312.5776, pp. 1040–1043. Bono, JM, EC Olesnicky, and LM Matzkin (2015). “Connecting genotypes, pheno- types and fitness: harnessing the power of CRISPR/Cas9 genome editing”. In:
Molecular ecology 24.15, pp. 3810–3822. Bragg, JG, MA Supple, RL Andrew, and JO Borevitz (2015). “Genomic variation
across landscapes: insights and applications”. In: New Phytologist 207.4, pp. 953– 967.
42 Breen, AL, DF Murray, and MS Olson (2012). “Genetic consequences of glacial sur-
vival: the late Quaternary history of balsam poplar (Populus balsamifera L.) in North America”. In: Journal of Biogeography 39.5, pp. 918–928. Brown, TB, R Cheng, XR Sirault, T Rungrat, KD Murray, M Trtilek, RT Furbank, M Badger, BJ Pogson, and JO Borevitz (2014). “TraitCapture: genomic and envi-
ronment modelling of plant phenomic data”. In: Current opinion in plant biology 18, pp. 73–79.
Burns, RM and BH Honkala (1990). Silvics of north America. Vol. 2. United States Department of Agriculture Washington, DC. Busby, PE, N Zimmerman, DJ Weston, SS Jawdy, J Houbraken, and G Newcombe
(2013). “Leaf endophytes and Populus genotype a ect severity of damage from the necrotrophic leaf pathogen, Drepanopeziza populi”. In: Ecosphere 4.10, pp. 1–12. Bylesjö, M, V Segura, RY Soolanayakanahally, AM Rae, J Trygg, P Gustafsson, S Jansson, and NR Street (2008). “LAMINA: a tool for rapid quantification of leaf
size and shape parameters”. In: BMC Plant Biology 8.1, p. 82. ∆aliÊ, I, F Bussotti, PJ Martínez-García, and DB Neale (2016). “Recent landscape ge-
nomics studies in forest trees—what can we believe?” In: Tree genetics & genomes 12.1, p. 3. Callahan, CM, CA Rowe, RJ Ryel, JD Shaw, MD Madritch, and KE Mock (2013). “Continental-scale assessment of genetic diversity and population structure in
quaking aspen (Populus tremuloides)”. In: Journal of biogeography 40.9, pp. 1780– 1791. Casasoli, M, J Derory, C Morera-Dutrey, O Brendel, I Porth, JM Guehl, F Villani, and A Kremer (2006). “Comparison of quantitative trait loci for adaptive traits
43 between oak and chestnut based on an expressed sequence tag consensus map”.
In: Genetics 172.1, pp. 533–546. Chhatre, VE, LM Evans, SP DiFazio, and SR Keller (2018). “Adaptive introgres- sion and maintenance of a trispecies hybrid complex in range-edge populations of
Populus”. In: Molecular ecology 27.23, pp. 4820–4838. Christe, C, KN Stölting, L Bresadola, B Fussi, B Heinze, D Wegmann, and C Lexer (2016). “Selection against recombinant hybrids maintains reproductive isolation
in hybridizing Populus species despite F1 fertility and recurrent gene flow”. In: Molecular ecology 25.11, pp. 2482–2498. Clark, RM, TN Wagler, P Quijada, and J Doebley (2006). “A distant upstream enhancer at the maize domestication gene tb1 has pleiotropic e ects on plant
and inflorescent architecture”. In: Nature genetics 38.5, p. 594. Cokus, SJ, PF Gugger, and VL Sork (2015). “Evolutionary insights from de novo
transcriptome assembly and SNP discovery in California white oaks”. In: BMC genomics 16.1, p. 552. Dahlgren, RA, MJ SINGER, and X Huang (1997). “Oak tree and grazing impacts on
soil properties and nutrients in a California oak woodland”. In: Biogeochemistry 39.1, pp. 45–64. De Carvalho, D, PK Ingvarsson, J Joseph, L Suter, C Sedivy, D MACAYA-SANZ, J Cottrell, B Heinze, I Schanzer, and C Lexer (2010). “Admixture facilitates adap-
tation from standing variation in the European aspen (Populus tremula L.), a widespread forest tree”. In: Molecular Ecology 19.8, pp. 1638–1650. Delfino-Mix, A, JW Wright, PF Gugger, C Liang, and VL Sork (2015). “Establishing
a range-wide provenance test in valley oak (Quercus lobata Née) at two Califor-
44 nia sites”. In: Gen. Tech. Rep. PSW-GTR-251. Berkeley, CA: US Department of Agriculture, Forest Service, Pacific Southwest Research Station: 413-424 251, pp. 413–424. Derory, J, C Scotti-Saintagne, E Bertocchi, L Le Dantec, N Graignic, A Jau res, M Casasoli, E Chancerel, C Bodénès, F Alberto, et al. (2010). “Contrasting rela- tionships between the diversity of candidate genes and variation of bud burst in
natural and segregating populations of European oaks”. In: Heredity 104.5, p. 438. Derory, J, P Léger, V Garcia, J Schae er, MT Hauser, F Salin, C Luschnig, C Plomion, J Glössl, and A Kremer (2006). “Transcriptome analysis of bud burst in sessile
oak (Quercus petraea)”. In: New Phytologist 170.4, pp. 723–738. Di Lonardo, S, M Capuana, M Arnetoli, R Gabbrielli, and C Gonnelli (2011). “Ex-
ploring the metal phytoremediation potential of three Populus alba L. clones using an in vitro screening”. In: Environmental Science and Pollution Research 18.1, pp. 82–90. DiFazio, SP, S Leonardi, GT Slavov, SL Garman, WT Adams, and SH Strauss (2012). “Gene flow and simulation of transgene dispersal from hybrid poplar plantations”.
In: New Phytologist 193.4, pp. 903–915. DiFazio, SP, GT Slavov, and CP Joshi (2011). “Populus: a premier pioneer system for plant genomics”. In: Enfield, NH: Science Publishers, pp. 1–28. Dillon, S, R McEvoy, DS Baldwin, S Southerton, C Campbell, Y Parsons, and GN
Rees (2015). “Genetic diversity of Eucalyptus camaldulensis Dehnh. following pop- ulation decline in response to drought and altered hydrological regime”. In: Austral Ecology 40.5, pp. 558–572.
45 Dosskey, M, R Schultz, and T Isenhart (1997). “Riparian Bu ers for Agricultural
Land in Agroforestry Notes”. In: National Agroforestry Center, USDA Forest Service, Rocky Mountain Station USDA Natural Resources Conservation Service, East Campus-UNL, Lincoln, Nebraska. Du, Q, B Xu, C Gong, X Yang, W Pan, J Tian, B Li, and D Zhang (2014). “Variation
in growth, leaf, and wood property traits of Chinese white poplar (Populus tomen- tosa), a major industrial tree species in Northern China”. In: Canadian journal of forest research 44.4, pp. 326–339. Durand, J, C Bodénès, E Chancerel, JM Frigerio, G Vendramin, F Sebastiani, A Buonamici, O Gailing, HP Koelewijn, F Villani, et al. (2010). “A fast and cost- e ective approach to develop and map EST-SSR markers: oak as a case study”.
In: BMC genomics 11.1, p. 570. Elmore, A, C Stylinski, and K Pradhan (2016). “Synergistic use of citizen science and remote sensing for continental-scale measurements of forest tree phenology”. In:
Remote Sensing 8.6, p. 502. Espinoza, O, U Buehlmann, M Bumgardner, and B Smith (2011). “Assessing changes in the US hardwood sawmill industry with a focus on markets and distribution”.
In: BioResources 6.3, pp. 2676–2689. Evans, LM, GJ Allan, SP DiFazio, GT Slavov, JA Wilder, KD Floate, SB Rood, and TG Whitham (2015). “Geographical barriers and climate influence demographic
history in narrowleaf cottonwoods”. In: Heredity 114.4, p. 387. Evans, LM, GT Slavov, E Rodgers-Melnick, J Martin, P Ranjan, W Muchero, AM Brunner, W Schackwitz, L Gunter, JG Chen, et al. (2014). “Population genomics
46 of Populus trichocarpa identifies signatures of selection and adaptive trait associ- ations”. In: Nature genetics 46.10, p. 1089. Fan, D, T Liu, C Li, B Jiao, S Li, Y Hou, and K Luo (2015). “E cient CRISPR/Cas9-
mediated targeted mutagenesis in Populus in the first generation”. In: Scientific reports 5, p. 12217. Fang, GC, BP Blackmon, ME Staton, CD Nelson, TL Kubisiak, BA Olukolu, D Henry, T Zhebentyayeva, CA Saski, CH Cheng, et al. (2013). “A physical map of
the Chinese chestnut (Castanea mollissima) genome and its integration with the genetic map”. In: Tree genetics & genomes 9.2, pp. 525–537. Fetter, KC and A Weakley (2019). “Reduced Gene Flow from Mainland Populations
of Liriodendron tulipifera into the Florida Peninsula Promotes Diversification”. In: International Journal of Plant Sciences 180.3, pp. 253–269. Feurdean, A, SA Bhagwat, KJ Willis, HJB Birks, H Lischke, and T Hickler (2013). “Tree migration-rates: narrowing the gap between inferred post-glacial rates and
projected rates”. In: PLoS One 8.8, e71797. Fitzpatrick, MC and SR Keller (2015). “Ecological genomics meets community-level modelling of biodiversity: Mapping the genomic landscape of current and future
environmental adaptation”. In: Ecology letters 18.1, pp. 1–16. Floate, KD (2004). “Extent and patterns of hybridization among the three species
of Populus that constitute the riparian forest of southern Alberta, Canada”. In: Canadian Journal of Botany 82.2, pp. 253–264. Floate, KD, J Godbout, MK Lau, N Isabel, and TG Whitham (2016). “Plant–
herbivore interactions in a trispecific hybrid swarm of Populus: assessing support
47 for hypotheses of hybrid bridges, evolutionary novelty and genetic similarity”. In:
New Phytologist 209.2, pp. 832–844. Foll, M and O Gaggiotti (2008). “A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective”.
In: Genetics 180.2, pp. 977–993. Foster, AJ, G Pelletier, P Tanguay, and A Séguin (2015). “Transcriptome Analysis
of Poplar during Leaf Spot Infection with Sphaerulina spp.” In: PLoS One 10.9, e0138162. Fralish, JS (2004). “The keystone role of oak and hickory in the central hardwood
forest”. In: Gen. Tech. Rep. SRS-73. Asheville, NC: US Department of Agriculture, Forest Service, Southern Research Station. pp. 78-87. Frichot, E, SD Schoville, G Bouchard, and O François (2013). “Testing for associations between loci and environmental gradients using latent factor mixed models”. In:
Molecular biology and evolution 30.7, pp. 1687–1699. Gentry, AH (1988). “Tree species richness of upper Amazonian forests”. In: Proceed- ings of the National Academy of Sciences 85.1, pp. 156–159. Geraldes, A, S Difazio, G Slavov, P Ranjan, W Muchero, J Hannemann, L Gunter, A Wymore, C Grassa, N Farzaneh, et al. (2013). “A 34K SNP genotyping array for
Populus trichocarpa: design, application to the study of natural populations and transferability to other Populus species”. In: Molecular Ecology Resources 13.2, pp. 306–323. Geraldes, A, N Farzaneh, CJ Grassa, AD McKown, RD Guy, SD Mansfield, CJ Dou-
glas, and QC Cronk (2014). “Landscape genomics of Populus trichocarpa: the role
48 of hybridization, limited gene flow, and natural selection in shaping patterns of
population structure”. In: Evolution 68.11, pp. 3260–3280. Geraldes, A, CA Hefer, A Capron, N Kolosova, F Martinez-Nuñez, RY Soolanayakana- hally, B Stanton, RD Guy, SD Mansfield, CJ Douglas, et al. (2015). “Recent Y
chromosome divergence despite ancient origin of dioecy in poplars (Populus)”. In: Molecular ecology 24.13, pp. 3243–3256. Gonzalez-Martinez, SC, KV Krutovsky, and DB Neale (2006). “Forest-tree population
genomics and adaptive evolution”. In: New Phytologist 170.2, pp. 227–238. Graham, EA, EC Riordan, EM Yuen, D Estrin, and PW Rundel (2010). “Public internet-connected cameras used as a cross-continental ground-based plant phe-
nology monitoring system”. In: Global Change Biology 16.11, pp. 3014–3023. Grivet, D, MF Deguilloux, RJ Petit, and VL Sork (2006). “Contrasting patterns of
historical colonization in white oaks (Quercus spp.) in California and Europe”. In: Molecular Ecology 15.13, pp. 4085–4093. Gugger, PF, SJ Cokus, and VL Sork (2016). “Association of transcriptome-wide se-
quence variation with climate gradients in valley oak (Quercus lobata)”. In: Tree Genetics & Genomes 12.2, p. 15. Gugger, PF, S Fitz-Gibbon, M Pellegrini, and VL Sork (2016). “Species-wide patterns
of DNA methylation variation in Quercus lobata and their association with climate gradients”. In: Molecular Ecology 25.8, pp. 1665–1680. Gugger, PF, JM Peñaloza-Ramírez, JW Wright, and VL Sork (2017). “Whole-transcriptome
response to water stress in a California endemic oak, Quercus lobata”. In: Tree physiology 37.5, pp. 632–644.
49 Günther, T and G Coop (2013). “Robust identification of local adaptation from allele
frequencies”. In: Genetics 195.1, pp. 205–220. Hacquard, S and CW Schadt (2015). “Towards a holistic understanding of the ben-
eficial interactions across the Populus microbiome”. In: New Phytologist 205.4, pp. 1424–1430.
Haldane, J (1948). “The theory of a cline”. In: Journal of genetics 48.3, pp. 277–284. Hall, D, V Luquez, VM Garcia, KR St Onge, S Jansson, and PK Ingvarsson (2007). “Adaptive population di erentiation in phenology across a latitudinal gradient in
European aspen (Populus tremula, L.): a comparison of neutral markers, candidate genes and phenotypic traits”. In: Evolution 61.12, pp. 2849–2860. Hall, D, XF MA, and PK Ingvarsson (2011). “Adaptive evolution of the Populus tremula photoperiod pathway”. In: Molecular ecology 20.7, pp. 1463–1474. Hamilton, MB (1999). “Tropical tree gene flow and seed dispersal”. In: Nature 401.6749, p. 129. Hancock, AM, B Brachi, N Faure, MW Horton, LB Jarymowycz, FG Sperone, C Toomajian, F Roux, and J Bergelson (2011). “Adaptation to climate across the
Arabidopsis thaliana genome”. In: Science 334.6052, pp. 83–86. Hartfield, M (2016). “Evolutionary genetic consequences of facultative sex and out-
crossing”. In: Journal of evolutionary biology 29.1, pp. 5–22. Herman, DJ, LJ Halverson, and MK Firestone (2003). “Nitrogen dynamics in an
annual grassland: oak canopy, climate, and microbial population e ects”. In: Eco- logical Applications 13.3, pp. 593–604. Hewitt, GM (1999). “Post-glacial re-colonization of European biota”. In: Biological journal of the Linnean Society 68.1-2, pp. 87–112.
50 Holeski, LM, MS Zinkgraf, JJ Couture, TG Whitham, and RL Lindroth (2013). “Transgenerational e ects of herbivory in a group of long-lived tree species: ma- ternal damage reduces o spring allocation to resistance traits, but not growth”.
In: Journal of Ecology 101.4, pp. 1062–1073. Holliday, JA, L Zhou, R Bawa, M Zhang, and RW Oubida (2016). “Evidence for extensive parallelism but divergent genomic architecture of adaptation along al-
titudinal and latitudinal gradients in Populus trichocarpa”. In: New Phytologist 209.3, pp. 1240–1251. Homolova, L, Z Malenovsk˝, JG Clevers, G García-Santos, and ME Schaepman (2013). “Review of optical-based remote sensing for plant trait mapping”. In:
Ecological Complexity 15, pp. 1–16. Huang, YN, H Zhang, S Rogers, M Coggeshall, and K Woeste (2015). “White oak growth after 23 years in a three-site provenance/progeny trial on a latitudinal
gradient in Indiana”. In: Forest Science. Imbert, E and F Lefévre (2003). “Dispersal and gene flow of Populus nigra (Salicaceae) along a dynamic river system”. In: Journal of Ecology 91.3, pp. 447–456. Ingvarsson, PK (2008). “Multilocus patterns of nucleotide polymorphism and the
demographic history of Populus tremula”. In: Genetics 180.1, pp. 329–340. Ingvarsson, PK, MV García, D Hall, V Luquez, and S Jansson (2006). “Clinal vari- ation in phyB2, a candidate gene for day-length-induced growth cessation and
bud set, across a latitudinal gradient in European aspen (Populus tremula)”. In: Genetics 172.3, pp. 1845–1853. Ingvarsson, PK, MV Garcia, V Luquez, D Hall, and S Jansson (2008). “Nucleotide polymorphism and phenotypic associations within and around the phytochrome
51 B2 locus in European aspen (Populus tremula, Salicaceae)”. In: Genetics 178.4, pp. 2217–2226. Isabel, N, M Lamothe, and SL Thompson (2013). “A second-generation diagnostic sin- gle nucleotide polymorphism (SNP)-based assay, optimized to distinguish among
eight poplar (Populus L.) species and their early hybrids”. In: Tree genetics & genomes 9.2, pp. 621–626. IvetiÊ, V, S StjepanoviÊ, J DevetakoviÊ, D StankoviÊ, and M äkoriÊ(2014). “Re- lationships between leaf traits and morphological attributes in one-year bareroot
Fraxinus angustifolia Vahl. seedlings”. In: Annals of Forest Research 57.2, pp. 197– 203. Järvinen, P, J Lemmetyinen, O Savolainen, and T Sopanen (2003). “DNA sequence
variation in BpMADS2 gene in two populations of Betula pendula”. In: Molecular ecology 12.2, pp. 369–384. Keller, SR, N Levsen, PK Ingvarsson, MS Olson, and P Ti n (2011). “Local selection
across a latitudinal gradient shapes nucleotide diversity in balsam poplar, Populus balsamifera L”. In: Genetics 188.4, pp. 941–952. Keller, SR, N Levsen, MS Olson, and P Ti n (2012). “Local adaptation in the
flowering-time gene network of balsam poplar, Populus balsamifera L.” In: Molec- ular biology and evolution 29.10, pp. 3143–3152. Keller, SR, MS Olson, S Silim, W Schroeder, and P Ti n (2010). “Genomic diversity, population structure, and migration following rapid range expansion in the Balsam
Poplar, Populus balsamifera”. In: Molecular Ecology 19.6, pp. 1212–1226.
52 Kempes, CP, GB West, K Crowell, and M Girvan (2011). “Predicting maximum tree
heights and other traits from allometric scaling and resource limitations”. In: PLoS One 6.6, e20551. Kim, YH, MD Kim, YI Choi, SC Park, DJ Yun, EW Noh, HS Lee, and SS Kwak
(2011). “Transgenic poplar expressing Arabidopsis NDPK2 enhances growth as well as oxidative stress tolerance”. In: Plant biotechnology journal 9.3, pp. 334– 347. Kleinknecht, GJ, HE Lintz, A Kruger, JJ Niemeier, MJ Salino-Hugg, CK Thomas, CJ Still, and Y Kim (2015). “Introducing a sensor to measure budburst and its
environmental drivers”. In: Frontiers in Plant Science 6, p. 123. Kremer, A, AG Abbott, JE Carlson, PS Manos, C Plomion, P Sisco, ME Staton, S
Ueno, and GG Vendramin (2012). “Genomics of Fagaceae”. In: Tree Genetics & Genomes 8.3, pp. 583–610. Kroeger, T, F Casey, P Alvarez, M Cheatum, and L Tavassoi (2010). An Economic Analysis of the Benefits of Habitat Conservation on California Rangelands. Wash- ington, D.C.: Conservation Economics Program, Defenders of Wildlife. Krutovsky, KV and DB Neale (2005). “Nucleotide diversity and linkage disequilibrium in cold-hardiness-and wood quality-related candidate genes in Douglas fir”. In:
Genetics 171.4, pp. 2029–2041. Langlet, O (1971). “Two hundred years genecology”. In: Taxon, pp. 653–721. Le Corre, V, N Machon, RJ Petit, and A Kremer (1997). “Colonization with long- distance seed dispersal and genetic structure of maternally inherited genes in forest
trees: a simulation study”. In: Genetics Research 69.2, pp. 117–125.
53 Lee, JH, ER Daugharthy, J Scheiman, R Kalhor, TC Ferrante, R Terry, BM Tur- czyk, JL Yang, HS Lee, J Aach, et al. (2015). “Fluorescent in situ sequencing (FISSEQ) of RNA for gene expression profiling in intact cells and tissues”. In:
Nature protocols 10.3, p. 442. Lepoittevin, C, C Bodénés, E Chancerel, L Villate, T Lang, I Lesur, C Boury, F Ehrenmann, D Zelenica, A Boland, et al. (2015). “Single-nucleotide polymorphism discovery and validation in high-density SNP array for genetic analysis in Euro-
pean white oaks”. In: Molecular ecology resources 15.6, pp. 1446–1459. Lesur, I, G Le Provost, P Bento, C Da Silva, JC Leplé, F Murat, S Ueno, J Bartholomé, C Lalanne, F Ehrenmann, et al. (2015). “The oak gene expression atlas: insights into Fagaceae genome evolution and the discovery of genes regulated during bud
dormancy release”. In: BMC genomics 16.1, p. 112. Levsen, ND, P Ti n, and MS Olson (2012). “Pleistocene speciation in the genus
Populus (Salicaceae)”. In: Systematic biology 61.3, p. 401. Liang, H, S Ayyampalayam, N Wickett, A Barakat, Y Xu, L Landherr, PE Ralph, Y Jiao, T Xu, SE Schlarbaum, et al. (2011). “Generation of a large-scale genomic
resource for functional and comparative genomics in Liriodendron tulipifera L.” In: Tree Genetics & Genomes 7.5, pp. 941–954. Linhart, YB and MC Grant (1996). “Evolutionary significance of local genetic di er-
entiation in plants”. In: Annual review of ecology and systematics 27.1, pp. 237– 277. Lotterhos, KE and MC Whitlock (2014). “Evaluation of demographic history and
neutral parameterization on the performance of FST outlier tests”. In: Molecular ecology 23.9, pp. 2178–2192.
54 Luppold, W and M Bumgardner (2008). “Regional analysis of hardwood lumber pro-
duction: 1963–2005”. In: Northern journal of applied forestry 25.3, pp. 146–150. Luquez, V, D Hall, BR Albrectsen, J Karlsson, P Ingvarsson, and S Jansson (2008).
“Natural phenological variation in aspen (Populus tremula): the SwAsp collec- tion”. In: Tree Genetics & Genomes 4.2, pp. 279–292. Ma, XF, D Hall, KRS Onge, S Jansson, and PK Ingvarsson (2010). “Genetic di eren- tiation, clinal variation and phenotypic associations with growth cessation across
the Populus tremula photoperiodic pathway”. In: Genetics 186.3, pp. 1033–1044. Madritch, MD, CC Kingdon, A Singh, KE Mock, RL Lindroth, and PA Townsend (2014). “Imaging spectroscopy links aspen genotype with below-ground processes
at landscape scales”. In: Philosophical Transactions of the Royal Society B: Bio- logical Sciences 369.1643, p. 20130194. Manel, S, C Perrier, M Pratlong, L Abi-Rached, J Paganini, P Pontarotti, and D Aurelle (2016). “Genomic resources and their influence on the detection of the
signal of positive selection in genome scans”. In: Molecular ecology 25.1, pp. 170– 184. Manel, S, MK Schwartz, G Luikart, and P Taberlet (2003). “Landscape genetics:
combining landscape ecology and population genetics”. In: Trends in ecology & evolution 18.4, pp. 189–197. Martins, K, PF Gugger, J Llanderal-Mendoza, A González-Rodríguez, ST Fitz-Gibbon, JL Zhao, H Rodríguez-Correa, K Oyama, and VL Sork (2018). “Landscape ge- nomics provides evidence of climate-associated genetic variation in Mexican pop-
ulations of Quercus rugosa”. In: Evolutionary Applications 11.10, pp. 1842–1858.
55 Martinsen, GD, TG Whitham, RJ Turek, and P Keim (2001). “Hybrid populations
selectively filter gene introgression between species”. In: Evolution 55.7, pp. 1325– 1335. McCauley, DE (1995). “The use of chloroplast DNA polymorphism in studies of gene
flow in plants”. In: Trends in ecology & evolution 10.5, pp. 198–202. McKown, AD, J Klápöt , RD Guy, A Geraldes, I Porth, J Hannemann, M Friedmann, W Muchero, GA Tuskan, J Ehlting, QCB Cronk, YA El-Kassaby, SD Mansfield, and CJ Douglas (2014). “Genome-wide association implicates numerous genes un-
derlying ecological trait variation in natural populations of Populus trichocarpa”. In: New Phytologist 203.2, pp. 535–553. McKown, AD, RD Guy, J Klápöt , A Geraldes, M Friedmann, QC Cronk, YA El- Kassaby, SD Mansfield, and CJ Douglas (2014). “Geographical and environmental
gradients shape phenotypic trait variation and genetic structure in Populus tri- chocarpa”. In: New Phytologist 201.4, pp. 1263–1276. McKown, AD, RD Guy, L Quamme, J Klápöt , J La Mantia, C Constabel, YA El-Kassaby, RC Hamelin, M Zifkin, and M Azam (2014). “Association genet-
ics, geography and ecophysiology link stomatal patterning in Populus trichocarpa with carbon gain and disease resistance trade-o s”. In: Molecular Ecology 23.23, pp. 5771–5790. McLachlan, JS, JS Clark, and PS Manos (2005). “Molecular indicators of tree migra-
tion capacity under rapid climate change”. In: Ecology 86.8, pp. 2088–2098. McLean, EH, SM Prober, WD Stock, DA Steane, BM Potts, RE Vaillancourt, and M Byrne (2014). “Plasticity of functional traits varies clinally along a rainfall
56 gradient in Eucalyptus tricarpa”. In: Plant, Cell & Environment 37.6, pp. 1440– 1451. McShea, WJ, WM Healy, P Devers, T Fearer, FH Koch, D Stau er, and J Waldon (2007). “Forestry matters: decline of oaks will impact wildlife in hardwood forests”.
In: The Journal of Wildlife Management 71.5, pp. 1717–1728. Meijón, M, SB Satbhai, T Tsuchimatsu, and W Busch (2014). “Genome-wide associ- ation study using cellular traits identifies a new regulator of root development in
Arabidopsis”. In: Nature genetics 46.1, p. 77. Meirmans, PG, M Lamothe, MC Gros-Louis, D Khasa, P Périnet, J Bousquet, and N Isabel (2010). “Complex patterns of hybridization between exotic and native
North American poplar species”. In: American Journal of Botany 97.10, pp. 1688– 1697. Menon, M, WJ Barnes, and MS Olson (2015). “Population genetics of freeze tolerance
among natural populations of Populus balsamifera across the growing season”. In: New Phytologist 207.3, pp. 710–722. Mishra, P and KC Panigrahi (2015). “GIGANTEA–an emerging story”. In: Frontiers in plant science 6, p. 8. Mock, KE, CM Callahan, MN Islam-Faridi, JD Shaw, HS Rai, SC Sanderson, CA Rowe, RJ Ryel, MD Madritch, RS Gardner, and PG Wolf (2012). “Widespread
triploidy in western North American aspen (Populus tremuloides)”. In: PLoS One 7.10, e48406. Mock, KE, C Rowe, MB Hooten, J Dewoody, and V Hipkins (2008). “Clonal dynamics
in western North American aspen (Populus tremuloides)”. In: Molecular Ecology 17.22, pp. 4827–4844.
57 Moran, EV, J Willis, and JS Clark (2012). “Genetic evidence for hybridization in red
oaks (Quercus sect. Lobatae, Fagaceae)”. In: American Journal of Botany 99.1, pp. 92–100.
Muller, CH (1952). “Ecological control of hybridization in Quercus: a factor in the mechanism of evolution”. In: Evolution, pp. 147–161. Myburg, AA, D Grattapaglia, GA Tuskan, U Hellsten, RD Hayes, J Grimwood, J
Jenkins, E Lindquist, H Tice, D Bauer, et al. (2014). “The genome of Eucalyptus grandis”. In: Nature 510.7505, p. 356. Myneni, R, J Dong, C Tucker, R Kaufmann, P Kauppi, J Liski, L Zhou, V Alexeyev, and M Hughes (2001). “A large carbon sink in the woody biomass of Northern
forests”. In: Proceedings of the National Academy of Sciences 98.26, pp. 14784– 14789. Nathan, R, GG Katul, HS Horn, SM Thomas, R Oren, R Avissar, SW Pacala, and SA Levin (2002). “Mechanisms of long-distance dispersal of seeds by wind”. In:
Nature 418.6896, p. 409. Neale, DB and A Kremer (2011). “Forest tree genomics: growing resources and ap-
plications”. In: Nature Reviews Genetics 12.2, p. 111. Neiman, M, MS Olson, and P Ti n (2009). “Selective histories of poplar protease in- hibitors: elevated polymorphism, purifying selection, and positive selection driving
divergence of recent duplicates”. In: New Phytologist 183.3, pp. 740–750. Oliverio, KA, M Crepy, EL Martin-Tryon, R Milich, SL Harmer, J Putterill, MJ Yanovsky, and JJ Casal (2007). “GIGANTEA regulates phytochrome A-mediated
photomorphogenesis independently of its role in the circadian clock”. In: Plant Physiology 144.1, pp. 495–502.
58 Olson, MS, N Levsen, RY Soolanayakanahally, RD Guy, WR Schroeder, SR Keller,
and P Ti n (2013). “The adaptive potential of Populus balsamifera L. to phe- nology requirements in a warmer global climate”. In: Molecular Ecology 22.5, pp. 1214–1230. Ordonez, N, MF Seidl, C Waalwijk, A Drenth, A Kilian, BP Thomma, RC Ploetz, and GH Kema (2015). “Worse comes to worst: bananas and Panama disease—when
plant and pathogen clones meet”. In: PLoS pathogens 11.11, e1005197. Ortego, J, PF Gugger, EC Riordan, and VL Sork (2014). “Influence of climatic niche suitability and geographical overlap on hybridization patterns among southern
Californian oaks”. In: Journal of Biogeography 41.10, pp. 1895–1908. Pavlik, BM, SG Johnson, and P M. (1991). Oaks of California. Los Olivos, CA: Cachuma Press. Petit, RJ, S Brewer, S Bordács, K Burg, R Cheddadi, E Coart, J Cottrell, UM Csaikl, B van Dam, JD Deans, et al. (2002). “Identification of refugia and post-glacial colonization routes of European white oaks based on chloroplast DNA and fossil
pollen evidence”. In: Forest ecology and management 156.1-3, pp. 49–74. Petit, RJ, J Carlson, AL Curtu, ML Loustau, C Plomion, A González-Rodríguez, V Sork, and A Ducousso (2013). “Fagaceae trees as models to integrate ecology,
evolution and genomics”. In: New Phytologist 197.2, pp. 369–371. Petit, RJ and A Hampe (2006). “Some evolutionary consequences of being a tree”.
In: Annu. Rev. Ecol. Evol. Syst. 37, pp. 187–214. Petit, RJ, FS Hu, and CW Dick (2008). “Forests of the past: a window to future
changes”. In: Science 320.5882, pp. 1450–1452.
59 Petit, RJ, E Pineau, B Demesure, R Bacilieri, A Ducousso, and A Kremer (1997).
“Chloroplast DNA footprints of postglacial recolonization by oaks”. In: Proceed- ings of the National Academy of Sciences 94.18, pp. 9996–10001. Pettorelli, N, K Safi, and W Turner (2014). Satellite remote sensing, biodiversity research and conservation of the future. Platt, A, PF Gugger, M Pellegrini, and VL Sork (2015). “Genome-wide signature of local adaptation linked to variable CpG methylation in oak populations”. In:
Molecular Ecology 24.15, pp. 3823–3830. Platt, A, BJ Vilhjálmsson, and M Nordborg (2010). “Conditions under which genome-
wide association studies will be positively misleading”. In: Genetics 186.3, pp. 1045– 1052. Plomion, C, JM Aury, J Amselem, T Alaeitabar, V Barbe, C Belser, H Bergès, C Bodénès, N Boudet, C Boury, A Canaguier, A Couloux, C Da Silva, S Duplessis, F Ehrenmann, B Estrada-Mairey, S Fouteau, N Francillonne, C Gaspin, C Guichard, C Klopp, K Labadie, C Lalanne, I Le Clainche, JC Leplé, G Le Provost, T Leroy, I Lesur, F Martin, J Mercier, C Michotey, F Murat, F Salin, D Steinbach, P Faivre- Rampant, P Wincker, J Salse, H Quesneville, and A Kremer (2016). “Decoding the oak genome: public release of sequence data, assembly, annotation and publication
strategies”. In: Molecular ecology resources 16.1, pp. 254–265. Porth, I, J Klápöt , AD McKown, J La Mantia, RD Guy, PK Ingvarsson, R Hamelin, SD Mansfield, J Ehlting, CJ Douglas, et al. (2015). “Evolutionary quantitative
genomics of Populus trichocarpa”. In: PloS one 10.11, e0142864. Porth, I, J Klapöte, O Skyba, J Hannemann, AD McKown, RD Guy, SP DiFazio, W Muchero, P Ranjan, GA Tuskan, MC Friedmann, J Ehlting, QCB Cronk, YA
60 El-Kassaby, CJ Douglas, and SD Mansfield (2013). “Genome-wide association
mapping for wood characteristics in Populus identifies an array of candidate single nucleotide polymorphisms”. In: New Phytologist 200.3, pp. 710–726. Porth, I, M Koch, M Berenyi, A Burg, and K Burg (2005). “Identification of adaptation- specific di erences in mRNA expression of sessile and pedunculate oak based on
osmotic-stress-induced genes”. In: Tree physiology 25.10, pp. 1317–1329. Rockman, MV (2012). “The QTN program and the alleles that matter for evolution:
all that’s gold does not glitter”. In: Evolution: International Journal of Organic Evolution 66.1, pp. 1–17. Savolainen, O, T Pyhäjärvi, and T Knürr (2007). “Gene flow and local adaptation in
trees”. In: Annu. Rev. Ecol. Evol. Syst. 38, pp. 595–619. Schmitz, RJ, MD Schultz, MA Urich, JR Nery, M Pelizzola, O Libiger, A Alix, RB McCosh, H Chen, NJ Schork, and J Ecker (2013). “Patterns of population epige-
nomic diversity”. In: Nature 495.7440, p. 193. Sheehan, MJ, LM Kennedy, DE Costich, and TP Brutnell (2007). “Subfunctional- ization of PhyB1 and PhyB2 in the control of seedling and mature plant traits in
maize”. In: The Plant Journal 49.2, pp. 338–353. Slavov, GT, SP DiFazio, J Martin, W Schackwitz, W Muchero, E Rodgers-Melnick, MF Lipphardt, CP Pennacchio, U Hellsten, LA Pennacchio, LE Gunter, P Ran- jan, K Vining, KR Pomraning, LJ Wilhelm, M Pellegrini, TC Mockler, M Freitag, A Geraldes, YA El-Kassaby, SD Mansfield, QCB Cronk, CJ Douglas, SH Strauss, D Rokhsar, and GA Tuskan (2012). “Genome resequencing reveals multiscale ge-
ographic structure and extensive linkage disequilibrium in the forest tree Populus trichocarpa”. In: New Phytologist 196.3, pp. 713–725.
61 Soler, M, O Serra, M Molinas, E García-Berthou, A Caritat, and M Figueras (2008). “Seasonal variation in transcript abundance in cork tissue analyzed by real time
RT-PCR”. In: Tree Physiology 28.5, pp. 743–751. Soolanayakanahally, RY, RD Guy, SN Silim, EC Drewes, and WR Schroeder (2009). “Enhanced assimilation rate and water use e ciency with latitude through in-
creased photosynthetic capacity and internal conductance in balsam poplar (Pop- ulus balsamifera L.)” In: Plant, Cell & Environment 32.12, pp. 1821–1832. Soolanayakanahally, RY, RD Guy, SN Silim, and M Song (2013). “Timing of pho-
toperiodic competency causes phenological mismatch in balsam poplar (Populus balsamifera L.)” In: Plant, cell & environment 36.1, pp. 116–127. Soolanayakanahally, RY, RD Guy, NR Street, KM Robinson, SN Silim, BR Albrect-
sen, and S Jansson (2015). “Comparative physiology of allopatric Populus species: geographic clines in photosynthesis, height growth, and carbon isotope discrimi-
nation in common gardens”. In: Frontiers in plant science 6, p. 528. Sork, VL, FW Davis, PE Smouse, VJ Apsit, RJ Dyer, J Fernandez-M, and B Kuhn
(2002). “Pollen movement in declining populations of California Valley oak, Quer- cus lobata: where have all the fathers gone?” In: Molecular Ecology 11.9, pp. 1657– 1668. Sork, VL, ST Fitz-Gibbon, D Puiu, M Crepeau, PF Gugger, R Sherman, K Stevens, CH Langley, M Pellegrini, and SL Salzberg (2016). “First draft assembly and anno-
tation of the genome of a California endemic oak Quercus lobata Née (Fagaceae)”. In: G3: Genes, Genomes, Genetics 6.11, pp. 3485–3495. Sork, VL and PE Smouse (2006). “Genetic analysis of landscape connectivity in tree
populations”. In: Landscape ecology 21.6, pp. 821–836.
62 Sork, VL, PE Smouse, VJ Apsit, RJ Dyer, and RD Westfall (2005). “A two-generation
analysis of pollen pool genetic structure in flowering dogwood, Cornus florida (Cornaceae), in the Missouri Ozarks”. In: American journal of botany 92.2, pp. 262– 271. Sork, VL, K Squire, PF Gugger, SE Steele, ED Levy, and AJ Eckert (2016). “Land- scape genomic analysis of candidate genes for climate adaptation in a California
endemic oak, Quercus lobata”. In: American Journal of Botany 103.1, pp. 33–46. Sork, V, S Aitken, R Dyer, A Eckert, P Legendre, and D Neale (2013). “Putting the landscape into the genomics of trees: approaches for understanding local adapta-
tion and population responses to changing climate”. In: Tree Genetics & Genomes 9.4, pp. 901–911. Sorrie, BA and AS Weakley (2001). “Coastal plain vascular plant endemics: phyto-
geographic patterns”. In: Castanea, pp. 50–82. Standiford, RB and RE Howitt (1993). “Multiple use management of California’s
hardwood rangelands.” In: Rangeland Ecology & Management/Journal of Range Management Archives 46.2, pp. 176–182. Steane, DA, BM Potts, E McLean, SM Prober, WD Stock, RE Vaillancourt, and M Byrne (2014). “Genome-wide scans detect adaptation to aridity in a widespread
forest tree species”. In: Molecular Ecology 23.10, pp. 2500–2513. Stinchcombe, JR and HE Hoekstra (2008). “Combining population genomics and quantitative genetics: finding the genes underlying ecologically important traits”.
In: Heredity 100.2, p. 158.
63 Strei , R (1999). “Pollen dispersal inferred from paternity analysis in a mixed oak
stand of Quercus robur L. and Q. petraea (Matt.) Liebl”. In: Molecular Ecology 8, pp. 831–841. Suarez-Gonzalez, A, CA Hefer, C Christe, O Corea, C Lexer, QC Cronk, and CJ Douglas (2016). “Genomic and functional approaches reveal a case of adaptive
introgression from Populus balsamifera (balsam poplar) in P. trichocarpa (black cottonwood)”. In: Molecular ecology 25.11, pp. 2427–2442. Swarbreck, D, C Wilks, P Lamesch, TZ Berardini, M Garcia-Hernandez, H Foerster, D Li, T Meyer, R Muller, L Ploetz, et al. (2007). “The Arabidopsis Informa-
tion Resource (TAIR): gene structure and function annotation”. In: Nucleic acids research 36.suppl_1, pp. D1009–D1014. Tarkka, MT, S Herrmann, T Wubet, L Feldhahn, S Recht, F Kurth, S Mailänder, M Bönn, M Neef, O Angay, M Bacht, M Graf, H Maboreke, F Fleischmann, TEE Grams, L Ruess, M Schädler, R Brandl, S Scheu, SD Schrey, I Grosse, and F Buscot (2013). “OakContig DF 159.1, a reference library for studying di erential
gene expression in Quercus robur during controlled biotic interactions: use for quantitative transcriptomic profiling of oak roots in ectomycorrhizal symbiosis”.
In: New Phytologist 199.2, pp. 529–540. Taylor, G (2002). “Populus: Arabidopsis for forestry. Do we need a model tree?” In: Annals of Botany 90.6, pp. 681–689. Thompson, SL, M Lamothe, PG Meirmans, P Perinet, and N Isabel (2010). “Repeated
unidirectional introgression towards Populus balsamifera in contact zones of exotic and native poplars”. In: Molecular Ecology 19.1, pp. 132–145.
64 Ti n, P and J Ross-Ibarra (2014). “Advances and limits of using population genetics
to understand local adaptation”. In: Trends in ecology & evolution 29.12, pp. 673– 680. Toomey, M, MA Friedl, S Frolking, K Hufkens, S Klosterman, O Sonnentag, DD Baldocchi, CJ Bernacchi, SC Biraud, G Bohrer, et al. (2015). “Greenness indices from digital cameras predict the timing and seasonal dynamics of canopy-scale
photosynthesis”. In: Ecological Applications 25.1, pp. 99–115. Tuskan, GA et al. (2006). “The genome of black cottonwood, Populus trichocarpa (Torr. & Gray)”. In: science 313.5793, pp. 1596–1604. Ueno, S, G Le Provost, V Léger, C Klopp, C Noirot, JM Frigerio, F Salin, J Salse, M Abrouk, F Murat, O Brendel, J Derory, P Abadie, P Léger, C Cabane, A Barré, A de Daruvar, A Couloux, P Wincker, MP Reviron, A Kremer, and C Plomion (2010). “Bioinformatic analysis of ESTs collected by Sanger and pyrosequencing
methods for a keystone forest tree species: oak”. In: BMC genomics 11.1, p. 650. Van Valen, L (1976). “Ecological species, multispecies, and oaks”. In: Taxon, pp. 233– 239. Verhoeven, KJ, JJ Jansen, PJ van Dijk, and A Biere (2010). “Stress-induced DNA
methylation changes and their heritability in asexual dandelions”. In: New Phy- tologist 185.4, pp. 1108–1118. Wang, L, P Ti n, and MS Olson (2014). “Timing for success: expression pheno-
type and local adaptation related to latitude in the boreal forest tree, Populus balsamifera”. In: Tree genetics & genomes 10.4, pp. 911–922.
65 Whitham, TG, SP DiFazio, JA Schweitzer, SM Shuster, GJ Allan, JK Bailey, and SA Woolbright (2008). “Extending genomics to natural communities and ecosystems”.
In: Science 320.5875, pp. 492–495. Yang, WY, J Novembre, E Eskin, and E Halperin (2012). “A model-based approach
for analysis of spatial structure in genetic data”. In: Nature genetics 44.6, p. 725. Zhang, P, F Wu, X Kang, C Zhao, and Y Li (2015). “Genotypic variations of biomass
feedstock properties for energy in triploid hybrid clones of Populus tomentosa”. In: BioEnergy Research 8.4, pp. 1705–1713. Zhou, L, R Bawa, and J Holliday (2014). “Exome resequencing reveals signatures of demographic and adaptive processes across the genome and range of black
cottonwood (Populus trichocarpa)”. In: Molecular ecology 23.10, pp. 2486–2499.
66 Chapter 2
StomataCounter: a neural network for automatic stomata identifica- tion and counting.
Karl C. Fettera,b,1,2, Sven Eberhardtc,1, Rich S. Barclayb, Scott Wingb, and Stephen R. Kellera aDepartment of Plant Biology, University of Vermont bDepartment of Paleobiology, Smithsonian Institution, National Museum of Natural History cAmazon.com, Inc. 1S.E. and K.C.F. contributed equally to this work 2Corresponding author: Karl C. Fetter (e-mail: [email protected], tel: +1 802 656 2930)
67 Abstract
Stomata regulate important physiological processes in plants and are often pheno- typed by researchers in diverse fields of plant biology. Currently, there are no user friendly, fully-automated methods to perform the task of identifying and counting stomata, and stomata density is generally estimated by manually counting stomata. We introduce StomataCounter, an automated stomata counting system using a deep convolutional neural network to identify stomata in a variety of di erent microscopic images. We use a human-in-the-loop approach to train and refine a neural network on a taxonomically diverse collection of microscopic images. Our network achieves 98.1% identification accuracy on Ginkgo SEM micrographs, and 94.2% transfer accuracy when tested on untrained species. To facilitate adoption of the method, we provide the method in a publicly available website at http://www.stomata.uvm.edu. 2.1 Introduction
Stomata are important microscopic organs which mediate important biological and ecological processes and function to exchange gas with the environment. A stoma is composed of a pair of guard cells forming an aperture that, in some species, are flanked by a pair of subsidiary cells (Bergmann and Sack, 2007). Regulation of the aperture pore size, and hence, the opening and closing of the pore, is achieved through changing turgor pressure in the guard cells (Shimazaki et al., 2007). Stomata evolved to permit exchange between internal and external sources of gases, most notably CO2 and water vapor, through the impervious layer of the cuticle (Kim et al., 2010). Because of their importance in regulating plant productivity and response to the environment, stomata have been one of the key functional traits of interest to re- searchers working across scales in plant biology. At the molecular level, regulation of stomata has been the subject of numerous genetic studies (see Shimazaki et al., 2007 and (Kim et al., 2010) for detailed reviews) and crop improvement programs have modified stomata phenotypes to increases yield (Fischer et al., 1998). Stomata also mediate trade-o s between carbon gain and pathogen exposure that are of interest to plant ecophysiologists and pathologists. For example, foliar pathogens frequently exploit the aperture pore as a site of entry. In Populus, some species, and even populations within species, evolve growth strategies that maximize carbon fixation through increased stomatal density and aperture pore size on the adaxial leaf surface. This adaptation results in a cost of increased infection by fungal pathogens which have more sites of entry to the leaf (McKown et al., 2014). Stomata, as sites of water vapor exchange, are also implicated in driving environmental change across biomes
69 (Hetherington and Woodward, 2003) and variation of stomatal density and aperture pore length are linked to changes of ecosystem productivity (Wang et al., 2015). Stomatal traits are of particular interest to paleoecologists and paleoclimatologists due to the relationship between stomatal traits and gas exchange. Measurements of stomatal density from fossil plants have been proposed as an indicator of paleoatmo- spheric CO2 concentration (Royer, 2001), and measuring stomatal traits to predict paleoclimates has become widely adopted (JC McElwain and Steinthorsdottir, 2017). It is clear from their physiological importance that researchers across a wide vari- ety of disciplines in plant biology will phenotype stomatal traits for decades to come. A typical stomatal phenotyping pipeline consists of collecting plant tissue, creating a mounted tissue for imaging, imaging of the specimen, and manual phenotyping of a trait of interest. This last step can be the most laborious, costly, and time-consuming task, reducing the e ciency of the data acquisition and analysis pipeline. This is es- pecially important in large-scale plant breeding and genome-wide association studies, where phenotyping has been recognized as the new data collection bottleneck, in com- parison to the relative ease of generating large genome sequence datasets (Hudson, 2008). Here, we seek to minimize the burden of high-throughput phenotyping of stom- atal traits by introducing an automated method to identify and count stomata from micrographs. Although automated phenotyping methods using computer vision have been developed (Higaki et al., 2014; Laga et al., 2014; Duarte et al., 2017; Jayakody et al., 2017), these highly specialized approaches require feature engineering specific to a collection of images. These methods do not transfer well to images created with novel imaging and processing protocols. Additionally, tuning hand-crafted methods
70 to work on a general set of conditions is cumbersome and often impossible to achieve. Recent applications of deep learning techniques to stomata counting have demon- strated success (Bhugra et al., 2018; Aono et al., 2019), but are not easily accessible to the public and use training data sets sampled from few taxa. Deep convolutional neural networks (DCNN) circumvent specialized approaches by training the feature detector along with the classifier (LeCun et al., 2015). The method has been widely successful on a range of computer vision problems in biology such as medical imaging (see Shen et al. (2017) for a review) or macroscopic plant phenotyping (Ubbens and Stavness, 2017). The main caveat of deep learning methods is that a large number of parameters have to be trained in the feature detector. Im- provements in network structure (He et al., 2016) and training procedure (Simonyan and Zisserman, 2014) have helped training of large networks which incorporate gradi- ent descent learning methods and prove to be surprisingly resilient against overfitting (Poggio et al., 2017). Nevertheless, a large number of correctly annotated training images is still required to allow the optimizer to converge to a correct feature repre- sentation. Large labeled training sets like the ImageNet database exist (Deng et al., 2009), but for a highly specialized problems, such as stomata identification, publicly available datasets are not available at the scale required to train a typical DCNN. We solve these problems by creating a large and taxonomically diverse training dataset of plant cuticle micrographs and by creating a network with a human-in- the-loop approach. Our development of this method is provided to the public as a web-based tool called StomataCounter which allows plant biologists to rapidly upload plant epidermal image datasets to pre-trained networks and then annotate stomata on cuticle images when desired. We apply this tool to a the training dataset and
71 achieve robust identification and counts of stomata on a variety of angiosperm and pteridosperm taxa.
2.2 Materials and Methods
2.2.1 Biological Material
Micrographs of plant cuticles were collected from four sources: the cuticle database
(https://cuticledb.eesi.psu.edu/, Barclay, J McElwain, et al., 2007); a Ginkgo common garden experiment (Barclay and Wing, 2016); a new intraspecific collection of balsam poplar (Populus balsamifera); and a new collection from living material at the Smithsonian, National Museum of Natural History (USNM) and the United States Botanic Garden (USBG) (Table 2.1). Specimens in the cuticle database col- lection were previously prepared by clearing and staining leaf tissue and then imaged. The entire collection of the cuticle database was downloaded on November 16, 2017. Downloaded images contained both the abaxial and adaxial cuticles in a single image, and were automatically separated with a custom bash script. Abaxial cuticle micro- graphs were discarded if no stomata were visible or if the image quality was so poor that no stomata could be visually identified by a human. Gingko micrographs were prepared by chemically separating the upper and lower cuticles and imaging with an environmental SEM Barclay and Wing, 2016. To create the poplar dataset, dormant cuttings of P. balsamifera genotypes were collected across the eastern and central por- tions of its range in the United States and Canada by S.R. Keller et al. and planted in common gardens in Vermont and Saskatchewan in 2014. Fresh leaves were sampled
72 from June to July 2015 and immediately placed in plastic bags and then a cooler for temporary storage, up to three hours. In a laboratory, nail polish (Sally Hansen, big kwik dry top coat #42494) was applied to a 1 cm2 region of the adaxial and abaxial leaf surfaces, away from the mid-vein, and allowed to dry for approximately twenty minutes. The dried cast of the epidermal surface was lifted with clear tape and mounted onto a glass slide. The collection at the USNM and USBG was made similarly from opportunistically sampled tissues of the gardens around the National Museum of Natural History building. USBG collections were focused on the Hawai- ian, southern exposure, and medicinal plant, and tropical collections. The taxonomic identity of each specimen was recorded according to the existing label next to each plant. The USNM/USBG collection was imaged with an Olympus BX-63 microscope using di erential interference contrast (DIC). Some specimens had substantial relief and z-stacked images were created using cellSens image stacking software (Olympus Corporation). Slides were imaged in three non-adjacent areas per slide. The poplar collection was imaged with an Olympus BX-60 using DIC and each slide was imaged in two areas. Mounted material for the USNM/USBG and Ginkgo collections are deposited in the Smithsonian Institution, National Museum of Natural History in the Department of Paleobiology. Material for the poplar collection is deposited in the Keller laboratory in the Department of Plant Biology at the University of Vermont, USA. The training dataset totaled 4618 images (Table 2.1).
2.2.2 Deep Convolutional Neural Network
We used a deep convolutional neural network (DCNN) to generate a stomata like- lihood map for each input image, followed by thresholding and peak detection to
73 localize and count stomata (Fig. 1). Because dense per-pixel annotations of stoma versus non-stoma are di cult to acquire in large quantity, we trained a simple im- age classification DCNN based on the AlexNet structure instead (Krizhevsky et al., 2012), and copied the weights into a fully convolutional network to allow per-location assessment of stomata likelihood. Although this method does not provide dense per- pixel annotations, the resulting resolution proved to be high enough to di erentiate and count individual stomata. We used pre-trained weights for the lowest five convolutional layers from conv1 to conv5. The weights were taken from the ILSVRC image classification tasks (Rus- sakovsky et al., 2015) made available in the ca enet distribution (Jia et al., 2014). All other layers were initialized using Gaussian initialization with scale 0.01. Training was performed using a standard stochastic gradient descent solver as in Krizhevsky et al. (2012), with learning rate 0.001 for pre-trained layers and 0.01 for randomly initialized layers with a momentum of 0.9. Because the orientation of any individual stomata does not hold information for identification, we augmented data by rotating all training images into eight di erent orientations, applied random flipping and randomly positioned crop regions of the input size within the extracted 256x256 image patch. For distractors, we sampled patches from random image regions on human-annotated images that were at least 256 pixels distant from any labeled stoma. The trained network weights were transferred into a fully convolutional network (Long et al., 2015), which replaces the final fully connected layers by convolutions. To increase the resolution of the detector slightly, we reduced the stride of layer pool5 from two to one, and added a dilation (Yu and Koltun, 2015) of two to layer fc6conv to compensate. Due to margins (96 pixels) and stride (32 over all layers), application
74 of the fully convolutional network to an image of size s yields an output probability map p of size s (96 2)/32 along each dimension. ≠ ú To avoid detecting low-probability stomata within the noise, the probability map p was thresholded and all values below the threshold p-thresh=0.98 were set to zero. Local peak detection was run on a 3x3 pixel neighborhood on the thresholded map, and each peak, excluding those located on a 96-pixel width border, was labeled as a stoma center. This intentionally excludes stomata for which the detection peak is found near the border within the model margin to match the instructions given to human annotators. Resulting stomata positions were projected back onto the original image (see Fig. 2.2).
We built a user-friendly web service, StomataCounter, freely available at http:
//stomata.uvm.edu/, to allow the scientific community easy access to the method. We are using a flask/jquery/bootstrap stack. Source codes for network training, as well as the webserver are available at http://stomata.uvm.edu/source.To use StomataCounter, users upload single .jpg images or a zip files of of their .jpg images containing leaf cuticles prepared. Z-stacked image sets should be combined into single image before uploading. A new dataset is then created where the output of the automatic counts, image quality scores and image metadata are recorded and can easily be exported for further analysis. In addition to automatic processing, the user can manually annotate stomata and determine the empirical error rate of the automatic counts through a straightforward and intuitive web interface. The annotations can then be reincorporated into the training dataset to improve future performance of StomataCounter by contacting the authors and requesting retraining of the DCNN.
75 2.2.3 Statistical Analyses
We tested the performance of the DCNN with a partitioned set of images from each dataset source. Whenever possible, images from a given species were used in either the training or test set, but not both, and only 7 out of 1467 species are included in both. In total, 1941 images were used to test the performance of the network (Table 2.1). After running the test set through the network, stomata were manually annotated. If the center of a stoma intersected the bounding box around the perimeter of the image, it was not counted. To evaluate the DCNN, we first determined if it could identify stomata when they are known to be present and fail to identify them when they are absent. To execute this test, a set of 25 randomly selected abaxial plant cuticle micrographs containing stomata were chosen from each of the four datasets for a total of 100 images. To create a set of test images known to lack stomata, 100 adaxial cuticle micrographs were randomly sampled from the cuticle database. Visual inspection confirmed that none of the adaxial images contained stomata. Micrographs of thoracic aorta from an experimental rat model of preeclampsia (Johnson and Cipolla, 2017), and breast cancer tissue micrographs (Gelasca et al., 2008) were used as negative controls from non-plant material. As a second test, we deteremined how well the method could identify stomata from stomatal patterning mutants in Arabidopsis that violate the single cell spacing rule. We sampled images from Papanatsiou et al. (2016) including
88 tmm1 and 90 wild type micrographs, and used a paired t-test to determine if counts from manual or automatic were statistically di erent. We then determined if the precision (see below) was di erent between mutant and wild type accessions with
76 ANOVA. Identification accuracy is tested by applying the DCNN to a small image patch either centered on a stoma (target), or taken from at least 256 pixels distance to any labeled stomata (distractor). This yields true positive (NTP), true negative (NTN), false positive (NFP) and false negative (NFN) samples. We define the classification accuracy A as:
N + N A = TP TN NTP + NTN + NFP + NTP
We define classification precision P as:
N P = TP NTPNFP
We assume the human counts contain only true positives and the automatic count contain true positives and false positives. As such, we define precision as:
Human Count Precision = log AAutomatic CountB
This definition of precision identifies over-counting errors as negative values and under-counting as positive values. This measure of precision is undefined if either manual or automatic count is zero, and 30 of the 1772 observations were discarded. These samples were either out of focus, lacked stomata entirely, or too grainy for human detection of stomata. Classification recall R was defined as:
N R = TP NTP + NFN
77 Classification accuracy, precision, and recall, were calculated from groups of im- ages constructed to span the diversity of imaging capture methods (i.e. brightfield, DIC, and SEM) and magnification (i.e. 200x and 400x) to determine how well the training set from one group transfers to identifying stomata in another group. Groups used for transfer assessment were the cuticle database, the Ginkgo collection, micro- graphs imaged at 200x, at 400x, and the combined set of images. We use linear regression to understand the relationship between human and au- tomatic stomata counts. For the purpose of calculating error statistics, we consider deviation from the human count attributable to error in the method. Images were partitioned for linear models by collection source, higher taxonomic group (i.e. Eu- dicot, Monocot, Magnoliid, Gymnosperm, Fern, or Lycophyte), and magnification. To understand how di erent sources of variance contribute to precision variance, we collected data on the taxonomic family, magnification, imaging method, and three measures of image quality. The taxonomic family of each image was determined us- ing the open tree of life taxonomy accessed with the rotl package (Michonneau et al., 2016). Two image quality measures used were based on the properties of an image’s power-frequency distribution tail and described the standard deviation (fSTD) and mean (fMean) of the tail. Low values of fSTD and fMean indicate a blurry image, while high values indicate non-blurry images. The third image quality measure, tEn- tropy, is a measure an image’s information content. High entropy values indicate high contrast/noisy images while low values indicate lower contrast. These image quality measures were created with PyImq (Koho et al., 2016) and standardized between zero and one. Random e ects linear models were created with the R package lme4 (Bates et al., 2014) by fitting log precision to taxonomic family, magnification, and
78 imaging method as factors. Linear models were fit for the scaled error and image quality scores. We used the root mean square error (RMSE) of the model residuals to understand how the factors and quality scores described the variance of log precision. Higher values of RMSE indicate larger residuals. Statistical analyses were conducted in python and R (R Core Team, 2013). Stomata counts, image quality scores, and taxonomy of training and test set micrographs are provided in supporting information table S1. Images used for the training and test sets are available for download from dryad (https://doi.org/10.5061/dryad.kh2gv5f).
2.3 Results
2.3.1 Stomata detection
StomataCounter was able to accurately identify and count stomata when they were present in an image. False positives were detected in the adaxial cuticle, aorta, and breast cancer cell image sets at low frequency (Fig. S2.1). The mean number of stomata detected in the adaxial, aorta, and breast cancer image sets was 1.5, 1.4, and 2.4, respectively, while the mean value of the abaxial set containing stomata was 24.1.
Stomata measured from micrographs of mutant and wild type Arabidopsis by human and automatic counting showed few di erences between counting method (Fig. S2.2). A paired T-test of human and automatic counts failed to reject the null hypothesis of no di erence in means (t = -0.91, df = 175, p-value = 0.37), but precision was di erent between wild type and mutant genotypes (ANOVA: F-value =
79 07 2 28.82, p-value = 2.51e≠ ). The variance of precision in Col-0 accessions (‡ = 0.189) was higher than the mutant accessions (‡2 = 0.153), possibly due to image capture settings on the microscope. Correspondence between automated and human stomata counting varied among the respective sample sets. There was close agreement among all datasets to the human count, with the exception of some of the samples imaged at 200x magnification (Fig. 2.3a-c). In these samples, the network tended to under count relative to human observers. Despite the variation among datasets and the large error present in the 200x dataset, the slopes of all models were close to 1 (Table 2.2). The 400x, Ginkgo, and cuticle database sets all performed well at lower stomata counts, as indicated by their proximity to the expected one-to-one line and decreased in precision as counts increased. Precision has a non-linear relationship with image quality, and changes in variance of precision are correlated with variation of image quality. With a ratio of 17:1 images in the training vs. test sets, the Poplar dataset had the best precision, followed by the Ginkgo, Cuticle database, and USNM/USBG datasets (Fig. 2.3d). Among the di erent imaging methods, SEM had the best precision, followed by DIC, and finally and brightfield microscopy (Fig. 2.3e). The variance of precision is higher for images with 200x magnification (Fig. 2.3f). RMSE values were lowest for taxonomic family and the family:magnification interaction, suggesting these factors contributed less to deviations between human and automated stomata counts than image quality or imaging method (Table 2.3).
80 2.3.2 Classification accuracy
The peak accuracy (94.2%) on the combined test sets is achieved when all training sets are combined (Fig. 2.6). The combined dataset performs best on all test subsets of the data; i. e. adding additional training data - even from di erent sets - is always beneficial for the generalization of the network. Accuracy from train to test within a single species is higher (e. g. Ginkgo training for Ginkgo test at 97.4%) than transfer within datasets with a large number species across families (400x training to 400x test: 85.5%). The network does not generalize well between vastly di erent scales, i.e. the 200x- dataset, which contains images downscaled to half the image width and height. In this case, only training within the same scale achieves high accuracy (97.3%), while adding additional samples from the larger scale reduces the performance (to 90.7%). Precision values are generally higher than recall (0.99 precision on the combined training and test sets; 0.93 recall, Fig. S2.3), which shows that we mostly miss stomata rather than misidentifying non-existing stomata. Increased training size is correlated with increased accuracy (Fig. 2.4), and pro- viding a large number of annotated images is beneficial, as it lifts training accuracy from 72.8% with a training set of ten images to 94.2% with the complete set of training images.
81 2.4 Discussion
Stomata are an important functional trait to many fields within plant biology, yet manual phenotyping of stomata counts is a laborious method that has few controls on human error and reproducibility. We created a fully automatic method for counting stomata that is both highly sensitive and reproducible, allows the user to quantity error in their counts, but is also entirely free of parameter optimization from the user. Furthermore, the DCNN can be iteratively retrained with new images to improve performance and adjust to the needs of the community. This is a particular advantage of this method for adjusting to new taxonomic sample sets. However, new users are not required to upload new image types or images from new species for the method to work on their material. The pre-loaded neural network was specifically trained on diverse set of image types and from many species (N = 739). As the complexity of processing pipelines in biological studies increases, repro- ducibility of studies increasingly becomes a concern (Vieland, 2001). Apart from the reduced workload, automated image processing provides better reproducibility than manual stomata annotations. For instance, if multiple experts count stomata, they may not agree, causing artificial di erences between compared populations. This includes how stomata at the edge of an image are counted, and what to do with di cult to identify edge cases. Automatic counters will have an objective measure, and introduce no systematic bias between compared sets as long as the same model is used. Additionally, our human counter missed stomata that the machine detected (Fig. 2.5). Our method is not the first to identify and count stomata. However, previous
82 methods have not been widely adopted by the community and a survey of recent literature indicates manual counting is the predominant method (Liu et al., 2018; Morales-Navarro et al., 2018; Sumathi et al., 2018; Peel et al., 2017; Takahashi et al., 2015). Previous methods relied on substantial image pre-processing to generate images for thresholding to isolate stomata for counting (Oliviera et al., 2014; Duarte et al., 2017). Thresholding can perform well in a homogenous collection of images, but quickly fails when images collected by di erent preparation and microscopy methods are provided to the thresholding method (K. Fetter, pers. obs.). Some methods also require the user to manually segment stomata and subsequently process those images to generate sample views to supply to template matching methods (Laga et al., 2014). Object-oriented methods (Jian et al., 2011) also require input from the user to define model parameters. These methods invariably requires the user to participate in the counting process to tune parameters and monitor the image processing, and are not fully autonomous. More recently, cascade classifier methods have been developed which perform well on small collections of test sets (Vialet-Chabrand and Brendel, 2014; Jayakody et al., 2017). Additionally, most methods rely on a very small set of images (50 to 500) typically sampled from just a few species or cultivars to create the training and/or test set (e.g., Bhugra et al., 2018). Apart from generalization concerns, several published methods require the user to have some experience coding in python or C++, a requirement likely to reduce the potential pool of end users. Our method resolves these issues by being publicly available, fully autonomous of the user, who is only required to upload jpeg formatted images, is free of any requirement for the user to code, and is trained on a relatively large and taxonomically diverse set of cuticle images. Users may wish to set up
83 StomataCounter locally on their own servers, and we have integrated the python scripts into an o ine program users can run at the command line. The command line version is available in the github repository: https://github.com/SvenTwo/epidermal. Users can generate model weights from a custom set of training images and supply it to the image processing script for stomatal identification, or use the predefined weights generated from this work. We have demonstrated that this method is capable of accurately identifying stom- ata when they are present, but false positives may still be generated by shapes in images that approximate the size and shape of stomata guard cells. Conversely, false negatives are generated when a stomata is hidden by a feature of the cuticle or if poor sample preparation/imaging introduces blur. This issue is likely avoidable through increased sample size of the training image set and good sample preparation and mi- croscopy techniques by the end user. The method is also able to detect stomata from mutant Arabidopsis genotypes that violate the single spacing rule. The importance of having a well-matched training and testing image set was ap- parent at 200x, where there was a subset of observations with low transfer accuracy (Fig. 2.6), and StomataCounter consistently under counted relative to human ob- servation (Fig. 2.3). We argue that transferring architecture between scales is not advisable and images should be created by the user to match the predominant size (2048x2048) and magnification (400x) of images in the training network. Our training set of images spanned 82 di erent families and was over-represented by angiosperms. Stomata in gymnosperms are typically sunken into pores that make it di cult to obtain good nail polish casts. Models tested to explain variation in scaled error re- vealed that taxonomic family and its interaction with magnification were the factors
84 that had the best explanatory power for scaled error. Future collections of gym- nosperm cuticles could be uploaded to the DCNN to retrain it in order to improve the performance of the method for gymnosperms. More generally however, this highlights how users will need to be thoughtful about matching of training and test samples for taxa that may deviate in stomata morphol- ogy from the existing reference database. We therefore recommend that users working with new or morphologically divergent taxa first run several pilot tests with di erent magnification and sample preparation techniques to find optimal choices that mini- mize error for their particular study system. SEM micrographs had the least amount of error, followed by DIC, and finally brightfield (Fig. 2.3g-i). Lastly, image quality was strongly related to log precision; predictably, images that are too noisy (i.e. high entropy) and out-of-focus (low fMean or fSTD) will generate higher error. Obtain- ing high quality, in-focus images should be a priority during data acquisition. We provide these guidelines for using the method, and recommend that users read these guidelines before collecting a large quantity of images:
• Collect sample images using di erent microscopy methods from the same tissues. We recommend an initial collection of 25 to 100 images prior to initiating a new large-scale study.
• Run images through StomataCounter.
• Establish a ’true’ stomata count using the annotation feature.
• Regress image quality scores (automatically provided in output csv file) against log precision.
85 • Regress human versus automatic counts and assess error.
• Choose the microscopy method that minimizes error and image the remaining samples.
Di erent microscopy methods can include using DIC or phase contrast filters, ad- justing the aperture to increase contrast, or staining tissue and imaging with under fluorescent light (see Eisele et al., 2016 for more suggestions). If a large collection of images is already available and re-imaging is not feasible, we recommend the users take the following actions:
• Randomly sample 100 images.
• Upload the images to StomataCounter.
• Annotate images to establish the ’true’ count.
• Explore image quality scores with against the log(precision) to determine a justifiable cut-o value for filtering images.
• Discard images below the image quality cut-o value.
New users can also contact the authors through the StomataCounter web interface and we may retrain the model to include the 100 annotated images. Fast and accurate counting of stomata increases productivity of workers and de- creases the time from collecting a tissue to analyzing the data. Until now, assessing
86 measurement error required phenotyping a reduced set of images multiple times by, potentially, multiple counters. With StomataCounter, users can instantly phenotype their images and annotate them to create empirical error rates. The open source code and flexibility of using new and customized training sets will make StomataCounter and important resource for the plant biology community.
2.5 Acknowledgements
We thank Kyle Wallick at the United States Botanic Garden for facilitating access to the living collections of the Garden. Scott Whitaker aided in imaging cuticle specimens with SEM and DIC microscopy in the Laboratories of Analytical Biology at the Smithsonian Institution, National Museum of Natural History. Rat aorta images were provided by Dr. A. Chapman and Dr. M. Cipolla at the University of Vermont. Arabidopsis stomatal patterning mutants were provided by Dr. Maria Papanatsiou and Dr. Michael Blatt. Dr. Terry Delaney at the University of Vermont kindly allowed KCF to use his microscope for imaging. Funding to create Stomata Counter was provided by an NSF grant to SRK (IOS-1461868) and a Smithsonian Institution Fellowship to KCF.
2.6 Data availability
Training and test set micrographs, model weights, and ca enet model definition pro- tocol are available as downloads from Dryad (https://doi.org/10.5061/dryad. kh2gv5f).
87 2.7 Author contributions
The research was conceived and performed by KCF and SE. The website and python scripts were written by SE. Data and cuticle images were collected and analyzed by KCF. Ginkgo Images were submitted by RSB and SW. All authors interpreted the results. The manuscript was written by KCF, SE, and SRK. All authors edited and approved the manuscript.
88 2.8 Tables
Table 2.1: Description of the datasets used for training and testing the network. Abbrevia- tions: DIC, Di erential interference contrast; SEM, scanning electron microscopy.
Training Test Preparation Imaging Z-stacks Dataset N N N N Magnification Citation images species images species method method applied Poplar 3123 1 175 1 Nail polish DIC 400 no This paper Ginkgo 408 1 200 1 Lamina peel SEM 200 no Barclay and Wing, 2016 Cuticle DB 678 613 696 599 Clear & stain Brightfield 400 no Barclay, J McElwain, et al., 2007 USNM/USBG 409 124 696 132 Nail polish DIC, SEM1 1002, 200, 400 yes This paper Aorta - - 116 1 Elastic-Van Gieson Brightfield 40 no Johnson and Cipolla, 2017 Hematoxylin Breast Cancer - - 58 1 Brightfield no Gelasca et al., 2008 & eosin stain Total 4618 739 1941 735 Note: 1) Training SEM N = 15; 2) training set, N = 4.
89 Table 2.2: Summary of linear model fit parameters in Figure 2.3 for di erent test datasets. Dataset definitions given in text. Abbreviations: y-intercept, a; standard error, SE. 200x † images removed from this USNM/USBG set. Significance indicated by: ú: P<0.05; úú: P<0.01; úúú: P<0.001. (A) Dataset Cuticle DB Poplar Ginkgo USNM/USBG USNM/USBG † a 3.315úúú 0.823 -0.364 10.068úúú -0.695 SEa (0.635) (0.456) (0.677) (2.041) (0.5) s 0.853úúú 0.978úúú 0.832úúú 0.997úúú 1.167úúú SEs (0.029) (0.014) (0.028) (0.089) (0.026) r2 0.578 0.964 0.812 0.157 0.835
(B) Taxonomic group Eudicot Monocot Magnoliid Gymnosperm Fern Lycophyte a 10.405úúú 2.360úúú 4.360ú 0.004 0.624 0.771 SEa (1.643) (0.598) (1.844) (0.735) (0.474) (1.414) s 0.808úúú 0.808úúú 0.912úúú 0.830úúú 0.799úúú 0.751úú SEs (0.063) (0.056) (0.086) (0.031) (0.039) (0.169) r2 0.141 0.617 0.272 0.778 0.938 0.739
(C) Magnification 200x 400x a 18.346*** 1.446*** SEa (3.192) (0.381) s 0.654*** 0.970*** SEs (0.123) (0.017) r2 0.055 0.740
90 Table 2.3: Root mean square error (RMSE) of the model residuals. Lower RMSE values suggest a better fit of the model. The response in each model was precision. Abbreviations: fMean, mean power-frequency tail distribution; fSTD, standard deviation of power-frequency tail distribution.
Model RMSE
family 0.463 ≥ magnification 0.523 ≥ family:magnification 0.399 ≥ imaging method 0.519 ≥ fMean 0.523 ≥ fSTD 0.523 ≥ tEntropy 0.521 ≥ tEntropy:imaging method 0.515 ≥
91 2.9 Figures
Figure 2.1: Architecture of the deep convolutional neural network (DCNN) and classification tasks. Left: Training and testing procedure. First column: Target patches were extracted and centered around human-labeled stomata center positions; distractor patches were extracted in all other regions. A binary image classification network was trained. Second column: The image classification network was applied fully convolutional to the test image to produce a prediction heatmap. On the thresholded heatmap, peaks were detected and counted.
92 Figure 2.2: Sample sizes of images for the top 20 families represented in the training (yel- low) and test (blue) datasets (a). Examples of pre- and post-analysis images. A probability heatmap map is overlain onto the input image in the red channel. Detected stomata marked with circles with peak values given in green (b). Species in image pairs: Adiantum peru- vianum, Begonia loranthoides (top row); Chaemaerauthum gadacardii, Echeandia texensis (second row); Ginkgo biloba, Populus balsamifera (third row); Trillium luteum, Pilea liba- nensis (bottom row). Scale bars in row one and two equal 20 µm, scale bars in row three equal 25 µm, and scale bars in row four equal 50 µm.
93 Figure 2.3: Results of human and automatic counts and precision for each dataset organized by collection source (a,d), taxonomic group (b), and magnification (c), and imaging method (e). Precision and image quality scores have non-linear relationships, and changes in vari- ance correlate to image quality (g-i). Positive values of precision indicate undercounting of stomata relative to human counts, while negative values indicate overcounting. The dotted black line is the 1:1 line. See Table 2.2 for model summaries. Abbreviations: tEntropy, image entropy ;fMean, mean power-frequency tail distribution; fSTD, standard deviation of power-frequency tail distribution. 94 Figure 2.4: Accuracy by training image count and the e ect of increasing the training set size on classification accuracies. Since classification is a binary task, chance level is at 50%. Training image count “all” includes all 4,618 annotated training images.
False Negatives False Positives Human-annotated stomata Human-annotated as non-stomata Machine-annotated as non-stomata Machine-annotated as stomata
Figure 2.5: Samples that were mislabeled with high confidence by either the machine or human. False negatives (left panel) from stomata that were detected by the human, but not by the machine. Image features typically generating false negatives are blurry images, artifacts obstruction the stomata, low contrast between epidermal and guard cells, and very small scale stomata. False positives (right panel) are image features that are labeled by the machine as stomata, but not by the human counter, or are missed by the human counter (e.g. top and bottom right images in the panel). Errors typically generating false positives are image artifacts that superficially resemble stomata, particularly shapes mimicking interior guard cell structure.
95 Figure 2.6: Accuracies for models trained on di erent training datasets (vertical), tested on di erent test datasets. Combined is a union of all training and test sets. For precision and recall values, see Supporting Information Fig. S2.3.
96 Supporting Information
*** b
60
40
a 20
Automatic stomata count Automatic a a
0 abaxial_cuticle adaxial_cuticle aorta breast_cancer
Figure S2.1: Stomata identification test results
a b ● c
9 9 ● ● ●
● 1.0 ● ● ● ● 6 *** a a*** 6 ● ● ● ● ● ● ● ● ● Freq
*** ● ● ● ● ● ● ● Human Count Stomata count b 0.5 *** 3 b 3 ● ● ● ● ●
● ● ● ● ●
● ● ● ● ●
0 0 0.0
mutant wt 0.0 2.5 5.0 7.5 −1 0 1 Automatic Count Precision
Counting method
Human Automatic ● mutant ● wt mutant wt
Figure S2.2: Stomata identification in Arabidopsis cell-patterning mutants.
97 Figure S2.3: Precision and recall of transferred training and testing datasets.
98 References
Aono, A, J Nagai, G Dickel, R Marinho, P Oliveira, and F Faria (2019). “A stomata classification and detection system in microscope images of maize cultivars”. In:
bioRxiv, p. 538165. Barclay, R, J McElwain, D Dilcher, and B Sageman (2007). “The cuticle database: developing an interactive tool for taxonomic and paleoenvironmental study of the
fossil cuticle record”. In: Courier-Forschungsinstitut Senckenberg 258, p. 39.
Barclay, R and S Wing (2016). “Improving the Ginkgo CO2 barometer: implications for the early Cenozoic atmosphere”. In: Earth and Planetary Science Letters 439, pp. 158–171. Bates, DM, M Maechler, BM Bolker, and S Walker (2014). “Fitting Linear Mixed-
E ects Models Using lme4”. In: Journal of Statistical Software arXiv, p. 1406.5823. Bergmann, DC and FD Sack (2007). “Stomatal development”. In: Annual Review of Plant Biology 58, pp. 163–181. Bhugra, S, D Mishra, A Anupama, S Chaudhury, B Lall, A Chugh, and V Chinnusamy (2018). “Deep convolutional neural networks based framework for estimation of
stomata density and structure from microscopic images”. In: European Conference on Computer Vision. Springer, pp. 412–423. Deng, J, W Dong, R Socher, LJ Li, K Li, and L Fei-Fei (2009). “Imagenet: A large-
scale hierarchical image database”. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp. 248–255. Duarte, KT, MAG de Carvalho, and PS Martins (2017). “Segmenting high-quality digital images of stomata using the wavelet spot detection and the watershed
99 transform.” In: VISIGRAPP (4: VISAPP). Setubal, Portugal: Science and Tech- nology Publications, Lda, pp. 540–547. Eisele, JF, F Fäßler, PF Bürgel, and C Chaban (Oct. 2016). “A rapid and simple
method for microscopy-based stomata analyses”. In: PLOS ONE 11.10, pp. 1–13. Fischer, R, D Rees, K Sayre, Z Lu, A Condon, and L Saavedra (1998). “Wheat yield progress associated with higher stomatal conductance and photosynthetic rate,
and cooler canopies”. In: Crop science 38.6, pp. 1467–1475. Gelasca, ED, J Byun, B Obara, and BS Manjunath (2008). “Evaluation and bench-
mark for biological image segmentation”. In: Proceedings - International Confer- ence on Image Processing, ICIP, pp. 1816–1819. He, K, X Zhang, S Ren, and J Sun (2016). “Deep residual learning for image recog-
nition”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Los Alamitos, California: IEEE Computer Society, pp. 770–778. Hetherington, AM and FI Woodward (2003). “The role of stomata in sensing and
driving environmental change”. In: Nature 424.6951, p. 901. Higaki, T, N Kutsuna, and S Hasezawa (2014). “CARTA-based semi-automatic de-
tection of stomatal regions on an Arabidopsis cotyledon surface”. In: Plant Mor- phology 26.1, pp. 9–12. Hudson, ME (2008). “Sequencing breakthroughs for genomic ecology and evolutionary
biology”. In: Molecular ecology resources 8.1, pp. 3–17. Jayakody, H, S Liu, M Whitty, and P Petrie (2017). “Microscope image based fully automated stomata detection and pore measurement method for grapevines”. In:
Plant methods 13.1, p. 94.
100 Jia, Y, E Shelhamer, J Donahue, S Karayev, J Long, R Girshick, S Guadarrama, and T Darrell (2014). “Ca e: convolutional architecture for fast feature embedding”.
In: arXiv preprint arXiv:1408.5093. Jian, S, C Zhao, and Y Zhao (2011). “Based on remote sensing processing technol-
ogy estimating leaves stomatal density of Populus euphratica”. In: 2011 IEEE International Geoscience and Remote Sensing Symposium. IEEE, pp. 547–550. Johnson, AC and MJ Cipolla (2017). “Altered hippocampal arteriole structure and function in a rat model of preeclampsia: potential role in impaired seizure-induced
hyperemia”. In: Journal of Cerebral Blood Flow & Metabolism 37.8, pp. 2857–2869. Kim, TH, M Böhmer, H Hu, N Nishimura, and JI Schroeder (2010). “Guard cell
signal transduction network: advances in understanding abscisic acid, CO2, and Ca2+ signaling”. In: Annual review of plant biology 61, pp. 561–591. Koho, S, E Fazeli, JE Eriksson, and PE Hänninen (2016). “Image quality ranking
method for microscopy”. In: Scientific reports 6, p. 28962. Krizhevsky, A, I Sutskever, and GE Hinton (2012). “Imagenet classification with deep
convolutional neural networks”. In: Advances in neural information processing systems, pp. 1097–1105. Laga, H, F Shahinnia, and D Fleury (2014). “Image-based plant stornata phenotyp-
ing”. In: 2014 13th International Conference on Control Automation Robotics and Vision, ICARCV 2014, pp. 217–222. LeCun, Y, Y Bengio, and G Hinton (2015). “Deep learning”. In: Nature 521.7553, p. 436.
101 Liu, C, N He, J Zhang, Y Li, Q Wang, L Sack, and G Yu (2018). “Variation of stomatal traits from cold temperate to tropical forests and association with water-
use e ciency”. In: Functional Ecology 32.1, pp. 20–28. Long, J, E Shelhamer, and T Darrell (2015). “Fully convolutional networks for seman-
tic segmentation”. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Los Alamitos, California: IEEE Computer Society, pp. 3431– 3440. McElwain, JC and M Steinthorsdottir (2017). “Paleoecology, ploidy, paleoatmospheric composition, and developmental biology: a review of the multiple uses of fossil
stomata”. In: Plant Physiology 174.2, pp. 650–664. McKown, AD, RD Guy, L Quamme, J Klápöt , J La Mantia, C Constabel, YA El-Kassaby, RC Hamelin, M Zifkin, and M Azam (2014). “Association genet-
ics, geography and ecophysiology link stomatal patterning in Populus trichocarpa with carbon gain and disease resistance trade-o s”. In: Molecular Ecology 23.23, pp. 5771–5790. Michonneau, F, JW Brown, and DJ Winter (2016). “rotl: an R package to interact
with the Open Tree of Life data”. In: Methods in Ecology and Evolution 7.12, pp. 1476–1481. Morales-Navarro, S, R Pérez-Díaz, A Ortega, A de Marcos, M Mena, C Fenoll, E González-Villanueva, and S Ruiz-Lara (2018). “Overexpression of a SDD1-Like gene from wild tomato decreases stomatal density and enhances dehydration
avoidance in Arabidopsis and cultivated tomato”. In: Frontiers in plant science 9.
102 Oliviera, MWdS, NR da Silva, D Casanova, LFS Pinheiro, RM Kolb, and OM Bruno
(2014). “Automatic counting of stomata in epidermis microscopic images”. In: X Workshop de Visao Computacional 3, pp. 253–257. Papanatsiou, M, A Amtmann, and MR Blatt (2016). “Stomatal spacing safeguards stomatal dynamics by facilitating guard cell ion transport independent of the
epidermal solute reservoir”. In: Plant physiology 172.1, pp. 254–263. Peel, JR, MC Mandujano Sánchez, J López Portillo, and J Golubov (2017). “Stomatal
density, leaf area and plant size variation of Rhizophora mangle (Malpighiales: Rhizophoraceae) along a salinity gradient in the Mexican Caribbean”. In: Revista de Biología Tropical 65.2, pp. 701–712. Poggio, T, K Kawaguchi, Q Liao, B Miranda, L Rosasco, X Boix, J Hidary, and H Mhaskar (2017). “Theory of deep learning III: explaining the non-overfitting
puzzle”. In: arXiv preprint arXiv:1801.00173. RCoreTeam(2013).R: A Language and Environment for Statistical Computing.R Foundation for Statistical Computing. Vienna, Austria. Royer, DL (2001). “Stomatal density and stomatal index as indicators of paleoatmo-
spheric CO2 concentration”. In: Review of Palaeobotany and Palynology 114.1-2, pp. 1–28. Russakovsky, O, J Deng, H Su, J Krause, S Satheesh, S Ma, Z Huang, A Karpathy, A Khosla, M Bernstein, et al. (2015). “Imagenet large scale visual recognition
challenge”. In: International journal of computer vision 115.3, pp. 211–252. Shen, D, G Wu, and HI Suk (2017). “Deep learning in medical image analysis”. In:
Annual review of biomedical engineering 19, pp. 221–248.
103 Shimazaki, Ki, M Doi, SM Assmann, and T Kinoshita (2007). “Light regulation of
stomatal movement”. In: Annual Review of Plant Biology 58, pp. 219–247. Simonyan, K and A Zisserman (2014). “Very deep convolutional networks for large-
scale image recognition”. In: arXiv preprint arXiv:1409.1556. Sumathi, M, V Bachpai, B Deeparaj, A Mayavel, MG Dasgupta, B Nagarajan, D Rajasugunasekar, V Sivakumar, and R Yasodha (2018). “Quantitative trait loci
mapping for stomatal traits in interspecific hybrids of Eucalyptus”. In: Journal of genetics, pp. 1–7. Takahashi, S, K Monda, J Negi, F Konishi, S Ishikawa, M Hashimoto-Sugimoto, N Goto, and K Iba (2015). “Natural variation in stomatal responses to environmental
changes among Arabidopsis thaliana ecotypes”. In: PloS one 10.2, e0117449. Ubbens, JR and I Stavness (2017). “Deep plant phenomics: a deep learning platform
for complex plant phenotyping tasks”. In: Frontiers in plant science 8, p. 1190. Vialet-Chabrand, S and O Brendel (2014). “Automatic measurement of stomatal den-
sity from microphotographs”. In: Trees 28.6, pp. 1859–1865. Vieland, VJ (2001). “The replication requirement”. In: Nature Genetics 29.3, p. 244. Wang, R, G Yu, N He, Q Wang, N Zhao, and Z Xu (2015). “Latitudinal variation of leaf stomatal traits from species to community level in forests: linkage with
ecosystem productivity”. In: Nature Scientific Reports 5.14454, pp. 1–11. Yu, F and V Koltun (2015). “Multi-scale context aggregation by dilated convolutions”.
In: arXiv preprint arXiv:1511.07122.
104 Chapter 3
Trade-offs and selection conflicts in hybrid poplars indicate the stom- atal ratio as an important trait regulating disease resistance.
Karl C. Fettera,1,DavidNelsonb, Stephen R. Kellera,1 aDepartment of Plant Biology, University of Vermont bUniversity of Maryland Center for Environmental Science 1Corresponding authors: Stephen R. Keller (e-mail: [email protected], tel: +1 802 656 2930) 1Karl C. Fetter (e-mail: [email protected])
105 Abstract
Natural selection can remove maladaptive genotypic variance from populations, leav- ing reduced phenotypic variation as a signal of its action. Hybrid populations o er a unique opportunity to study phenotypic variance before selection purifies it, as these populations can have increase genotypic and phenotypic variance than can reveal trade-o s and selection conflicts not visible, or visible to a lesser extent, in unad- mixed populations. Here, we study the interactions between a fungal leaf rust disease (Melampsora medusae) and stomata and ecophysiology traits in a set of hybrid and unadmixed individuals formed by natural matings between Populus balsamifera, P. trichocarpa, P. angustifolia, and P. deltoides. Phenotypes were measured on cloned genotypes grown in a common garden and genotyped at 227K SNPs with GBS. Our analyses indicate hybridization decreases disease resistance and increases the vari- ance of stomatal ratio (SR), or the ratio of upper leaf surface stomata density to total stomata density. Heritability of SR was high in admixed populations (H2 =0.72) and covaries strongly with the proportion of P. balsamifera ancestry in a genome; thus, selection could e ectively reduce disease by selecting for low values of stomatal ratio. However, selection conflicts present in some admixed populations may prevent adap- tation to pathogens as a result of these populations being unable to occupy adaptive trait space along the growth-defense spectrum. These results suggest an important role for SR and stomatal patterning traits as a target of pathogen induced selection to achieve local fitness optima. Additionally, we demonstrate that hybridization is capable of generating, or magnifying maladaptive intermediate phenotypes to reveal trade-o s and selection conflicts. 3.1 Introduction
Trade-o s arise when selection on fitness components is constrained by negative cor- relations between pairs of traits (Stearns, 1989; Schluter et al., 1991). In plants, an important life history trade-o is between size and the age of first reproduction, with delayed reproduction correlating with longer fertility and higher quality o spring that have a greater chance of survival (Stearns, 1989). However, plants must cope with numerous natural enemies (e.g. herbivores and pathogens), and delaying re- production can come at a high cost if diverting limited resources to defense away from reproduction reduces fitness (Obeso, 2002). Depending on pathogen prevalence, plant life histories generally evolve towards increased investment in defensive enzymes and compounds paired with slower growth, or decreased investment in defense paired with faster growth. The fast-slow trade-o has been consistently identified in annuals (Tian et al., 2003), perennials (Messina et al., 2002), and long-lived trees (McKown, Guy, et al., 2014; McKown, Klápöt , et al., 2019), although this hypothesis is not without its critics (Kliebenstein, 2016). At the molecular level, the plant growth-defense trade-o is regulated by cellular signaling and hormonal regulation to direct metabolic activity away from growth and towards defense, or vice versa (Tian et al., 2003; Wang, 2012; Chandran et al., 2014). Host plant species may evolve a variety of costly defenses to combat disease, including the evolution of structural phenotypes to reduce exposure to pathogens (Gonzales-Vigil et al., 2017), immune systems to detect pathogens (Dangl and Jones, 2001) in order to initiate appropriate responses (Cai et al., 2018), and resistance via constitutive or inducible synthesis of defensive compounds (Bailey et al., 2005; Ullah
107 et al., 2018). Hosts may also avoid disease by migrating out of environments where a pathogen spatially or temporally occurs (Altizer et al., 2011), by colonizing new environments where the pathogen is absent (Bruns et al., 2018), or by evolving life history strategies that avoid or compensate for pathogen exposure (Stearns, 1989). The phenotypic expression of growth-defense trade-o s may be altered by hy- bridization between parental species that have evolved di erent strategies of growth or defense. Hybridization is an important mechanism for evolutionary change, and has been implicated in the maintenance of species boundaries due to negative selec- tion on advanced generation hybrids (Christe et al., 2016), introgression of beneficial alleles across species barriers (Chhatre et al., 2018), and, among other evolutionary phenomena, speciation (Mallet, 2007). It is well-known that hybridization changes the mean and variance of traits in hybrid populations with regards to either parental species; indeed, crop breeders intentionally create hybrids to take advantage of hy- brid vigor and transgressive segregation in F1 hybrids to increase yields and select for novel phenotypic combinations (Goulet et al., 2017). Transgressive segregation in hybrids is a source of new phenotypes for selection to act upon in natural populations (Goulet et al., 2017), and the balance between competing fitness components under- lying trade-o s can change in hybrid populations (Ellstrand and Schierenbeck, 2000).
For example, hybrid Salix eriocephala x sericea have an increase in disease severity to willow leaf rust (Melampsora sp.) by a factor of 3.5 in comparison to unadmixed parents (Roche and Fritz, 1998). Thus, it is apparent that hybridization can have beneficial and deleterious e ects on fitness, but the role of hybridization in revealing fitness trade-o s that are not evident, or cryptic, within the parental species has been little explored.
108 One scenario where a trade-o between plant growth and defense may arise is the relationship between stomatal traits and infection by fungal pathogens. Stomata are microscopic valves on the surface of the leaf that regulate gas exchange during photosynthesis. Some foliar pathogens enter their hosts via stoma by sensing the topography of the leaf surface for guard cells, and forming an appressorium over a stoma from which a penetration peg is used to invade the mesophyll tissue (Allen et al., 1991). The plant immune system has evolved mechanisms for detecting pathogens which can activate signal transduction pathways that lead stomatal closure (Melotto et al., 2006). However, closing stomata comes at the expense of limiting gas exchange, which may have negative e ects on carbon assimilation depending on the water-use e ciency of photosynthesis. Thus, the conditions for the evolution of a growth-defense trade-o mediated by selection on stomata exists. Hybridization may be expected to shift genotypes along the trade-o continuum, and when competing strategies inherited from either parent meet in a hybrid, we may expect a mismatch between the optimal survival and growth strategy to result in increased disease. In this study, we test for the e ects of hybridization on disease severity and stom- atal related growth traits, and investigate if hybridization reveals trade-o s and se- lection conflicts that may be masked in unadmixed populations. We use a sample of naturally formed hybrid genotypes between North American poplars infected by a common leaf rust of poplars, Melampsora medusae. We specifically ask the following questions: 1) how does hybridization change trait distributions, and does it alter her- itable variation; 2) which traits correlate most strongly with disease severity; 3) are there selection trade-o s between growth, disease severity, and stomatal traits; and 4) does hybridization reveal selection conflicts between traits underlying growth and
109 defense?
3.2 Materials & Methods
3.2.1 Study system
Poplars (Populus) are a genus of predominantly holarctic tree species (DiFazio et al., 2011). Extensive hybridization between species within a section of the genus, as well as between some sections, make the taxonomy of the genus di cult (Ecken- walder, 1996), with some authors identifying 29 (Eckenwalder, 1996) to as many as 60 species (Raven and Wu, 1999). Hybrids can be formed from species pairs within and between Populus sections Tacamahaca and Aigeiros, and extensive hybrid zones spontaneously form in where the ranges of two reproductively compatible species meet (Suarez-Gonzalez, Lexer, et al., 2018). Western North America contains sev- eral well documented hybrid zones (Martinsen et al., 2001; LeBoldus et al., 2013; Chhatre et al., 2018; Suarez-Gonzalez, Hefer, Lexer, Douglas, et al., 2018), including a tri-hybdrid zone in Alberta, Canada (Floate et al., 2016). In this study, we work with hybrids formed from crosses between the section Tacamahaca species balsam poplar (P. balsamifera), and either a black balsam poplar (P. trichocapra), a narrow- leaf cottonwood (P. angustifolia), or an eastern cottonwood (P. deltoides) from the Populus section Aigeiros. We lack information on the which species served as the maternal vs. paternal parent for each hybrid, so we adopt the convention of listing the P. balsamifera parent first. The disease we study is from the fungal leaf pathogen Melampsora medusae,a
110 macrocyclic basidiomycete whose aecial host is a larch (Larix), and telial host is a poplar (Feau et al., 2007). Uredospores (N + N) of M. medusae can clonally reproduce on poplar leaves within a single season, and the closely related M. larici-populina is an agricultural pest that can reduce yields of hybrid poplars grown for agroforestry (Pinon and Frey, 2005).
3.2.2 Materials
During winter 2013, dormant stem cuttings were collected from 534 trees from 59 pop- ulations spanning 9 Canadian provinces and 7 US States (longitudes -55 to -128 °W and latitudes 39 to 60 °N; Fig. 3.1, Table S3.1). The main focus of the 2013 collection was to sample P. balsamifera cuttings, but as the trees were dormant and some of the diagnostic traits were not immediately visible, a number of putative hybrids were also collected. Additionaly, 32 presumably unadmixed eastern cottonwood (P. deltoides) genotypes were collected in central Vermont, USA to serve as a reference popula- tion for identifying admixed P. deltoides hybrids with population genetic methods. For the 2013 collection, cuttings were grown for one year in a greenhouse, and then planted in the summer of 2014 in a common garden near Burlington, VT (44.444422 °N, -73.190164 °W). Replicates were planted in a randomized design with 2x2 meter spacing and 1,000 ramets were planted. The minimum, median, and maximum num- ber of ramets per genet was 1, 2, and 6, respectively. Plants were not fertilized, but were irrigated as-needed during the 2014 growing season to ensure establishment, and then received no supplemental water.
111 3.2.3 Molecular analyses
Fresh foliage from greenhouse grown plants was used for extracting whole genomic DNA using DNeasy 96 Plant Mini Kits (Qiagen, Valencia, CA, USA). DNA was quan- tified using a fluorometric assay (Qubit BR, Invitrogen) and confirmed for high molec- ular weight using 1% agarose gel electrophoresis. We used genotyping-by-sequencing (GBS) (Elshire et al., 2011) to obtain genome-wide polymorphism data for all 534 trees. Genomic sequencing libraries were prepared from 100 ng of genomic DNA per sample digested with EcoT221 followed by ligation of barcoded adapters of varying length from 4-8 bp, following Elshire et al. (2011). Equimolar concentrations of barcoded fragments were pooled and purified with QIAquick PCR purification kit. Purified products were amplified with 18 PCR cycles to append Illumina sequenc- ing primers, cleaned again using a PCR purification kit. The resulting library was screened for fragment size distribution using a Bioanalyzer. Libraries were sequenced at 48 plex (i.e., each library sequenced twice) using an Illumina HiSeq 2500 to generate 100 bp single end reads. Cornell University Institute of Genomic Diversity (Ithaca,
NY) performed the library construction and sequencing steps. For the P. deltoides reference population, library preparation was performed in the Keller lab using the same protocol. DNA sequencing was performed on an Illumina HiSeq 2500 at the Vermont Genetics Network core facility. Raw sequences reads are deposited in NCBI SRA under accession number SRP070954. We employed the Tassel GBS Pipeline (Glaubitz et al., 2014) to process raw sequence reads and call variants and genotypes. In order to pass the quality control, sequence reads had to have perfect barcode matches, the presence of a restriction
112 site overhang and no undecipherable nucleotides. Filtered reads were trimmed to
64 bp and aligned to the P. trichocarpa reference assembly version 3.0 (Tuskan et al., 2006) using the Burrows-Wheeler Aligner (BWA) (Li and Durbin, 2009). Single nucleotide polymorphisms (SNPs) were determined based on aligned positions to the reference, and genotypes called with maximum likelihood in Tassel (Glaubitz et al., 2014). SNP genotype and sequence quality scores were stored in Variant Call Format v4.1 (VCF) files, which were further processed with VCFTools 0.1.11 (Danecek et al., 2011). SNPs with a minor allele frequency < 0.001 were removed, and only biallelic sites were retained. Sites with with a mean depth < 5, genotype quality > 90, and indels were removed. Missing data was imputed with Beagle v5.0 (Browning et al., 2018), and sites with post-imputation genotype probability < 90 and sites with any missingness were removed. After filtering, the final dataset contained 227,607 SNPs for downstream analyses. The filial generation was estimated using NewHybrids (Anderson and Thompson, 2002) separately for each set of hybrids, (i.e. BxT, BxA, and BxD). Filial generation of P. balsamifera x angustifolia genotypes were previously estimated by Chhatre et al. (2018) and used here. NewHybrids requires reference populations for each species, and
P. trichocarpa reference genotypes were downloaded from previously published work, and we selected 25 genotypes from Pierce Co, Washington in populations known to lack admixture with P. balsamifera (Evans et al., 2014). P. deltoides reference genotypes were collected from the Winooski and Mad River watersheds in central
Vermont. The reference population for P. balsamifera was selected from individuals in the SLC, LON, and DPR populations that are known to lack admixture with other Populus species (see below). To select loci for distinguishing filial generations,
113 we determined the locus-wise FST di erence between reference populations. For the
BxT analysis, 355 loci with an FST di erence greater than 0.8 were randomly selected. In the BxD analysis, 385 loci that segregated completely between parental species (i.e.
FST di erence = 1) were randomly sampled. NewHybrids was run using Je rey’s prior for fi and ◊ for 200,000 sweeps with 100,000 discarded as burn-in. The cross type (BxB, BxT, BxA, or BxD) was determined from the NewHybrids result and the expected proportion of the genome from each parental species was calculated as:
(2n 1) Expected ancestry proportion = ≠ 2n
where n = the number crossing events.
3.2.4 Trait measurements
All traits were measured from common garden grown trees in 2015, and disease sever- ity was measured again in 2016 (Table 3.1). The severity of naturally innoculated leaf rust disease was phenotyped in both years using an ordinal scale from zero to four created by LaMantia et al. (2013) where: 0 = no uredinia visible, 1 = less than five uredinia per leaf on less than five leaves, 2 = less than five uredinia per leaf on more than five leaves, 3 = more than five uredinia per leaf on more than five leaves, and 4 = more than five uredinia on all leaves. In 2016, we also scored disease using the method of Dowkiw and Bastien (2004), where the single most infected leaf on a tree was classified as: 1 = no uredinia, 2 = 1 to 10 uredinia, 3 = 11 uredinia to 25 % of the leaf area, 4 = 25 to 50 % of the leaf area, 5 = 50 to 75 % of the leaf area, and 6 75 % of the leaf area covered in uredinia. We refer to the ordinal scale Ø
114 of La Mantia et al. (2013) as ordinal scale 1 and the ordinal scale of Dowkiw and Bastien (2004) as ordinal scale 2. A mature larch tree (approx. 20m tall) was located approximately 100m from the garden site, and we assume a uniform distribution of aeciospore inoculum into the garden. The pathogen was visually confirmed to be M. medusae using microscopy to view the ellipsoid to obovoid shape of uredospores, the size range (min = 19.6 µ m, mean = 28.3 µ m, max = 34.45 µ m, N = 17, Fig. 3.1f), and the presence of a smooth equatorial region on the spore flanked by polar regions with papillae (Van Kraayenoord et al., 1974) (Fig. 3.1f). To collect isotopic, elemental, and specific leaf area (SLA) data, three hole punches ( = 3 mm) were sampled in June 2015 from a central portion of each leaf adjacent to, but avoiding the central leaf vein. Hole punches were dried at 65 °C to constant mass. Approximately 2 mg of foliar tissue from each sample was weighed into a tin capsule and analyzed for %C, %N, ”13C, and ”15N using a Carlo Erba NC2500 elemental an- alyzer (CE Instruments, Milano, Italy) interfaced with a ThermoFinnigan Delta V+ isotope ratio mass spectrometer (Bremen, Germany) at the Central Appalachians Stable Isotope Facility (CASIF) at the Appalachian Laboratory (Frostburg, Mary- land, USA). %C and %N were calculated using a size series of atropine. The ”13C and ”15N data were normalized to the VPDB and AIR scales, respectively, using a two-point normalization curve with laboratory standards calibrated against USGS40 and USGS41. The long-term precision of an internal leaf standard analyzed along- side samples was 0.28% for ”13C and 0.24% for ”15N. Isotopic results are reported in units per mil (% ) and are expressed as:
”13C =[(13C/12C )/(13C/12C ) 1] 103 sample V-PDB ≠ ú
115 where ”13C is the isotopic composition of the sample in % , and V-PDB is the Vienna-Pee Dee Belemnite international standard. Carbon isotope discrimination against 13C( 13C) was calculated according to Farquhar et al. (1982) as: