Genetic architecture of water-use efficiency

1 Running Header: Genetic architecture of water-use efficiency

2

3

4 SUPPLEMENTARY MATERIALS

5

6

7 The genetic architecture of local adaptation II: The QTL landscape of water-use efficiency

8 for foxtail pine (Pinus balfouriana Grev. & Balf.)

9

10 Andrew J. Eckert1*, Douglas E. Harwood2, Brandon M. Lind2, Erin M. Hobson1, Annette Delfino

11 Mix3, Patricia E. Maloney4, and Christopher J. Friedline1

12

13 1Department of Biology, Virginia Commonwealth University, Richmond, VA 23284

14 2Integrative Life Sciences Program, Virginia Commonwealth University, Richmond, VA 23284

15 3Institute of Forest Genetics, USDA Pacific Southwest Research Station, Placerville, CA 95667

16 4Department of Pathology, University of California, Davis, CA 95616

17

18

19

20

21

22

23

24

25

26 Genetic architecture of water-use efficiency

27 Supplemental Tables

28

29 Table S1. Definitions of the 19 bioclimatic variables (Bio) used to construct distribution

30 models (SDMs). More information is available at: http://www.worldclim.org/bioclim

Bio Definition Variable Type

Bio1 Annual Mean Temperature Temperature

Bio2 Mean Diurnal Range Temperature

Bio3 Isothermality Temperature

Bio4 Temperature Seasonality Temperature

Bio5 Max Temperature of Warmest Month Temperature

Bio6 Min Temperature of Coldest Month Temperature

Bio7 Temperature Annual Range Temperature

Bio8 Mean Temperature of Wettest Quarter Both

Bio9 Mean Temperature of Driest Quarter Both

Bio10 Mean Temperature of Warmest Quarter Temperature

Bio11 Mean Temperature of Coldest Quarter Temperature

Bio12 Annual Precipitation Precipitation

Bio13 Precipitation of Wettest Month Precipitation

Bio14 Precipitation of Driest Month Precipitation

Bio15 Precipitation Seasonality Precipitation

Bio16 Precipitation of Wettest Quarter Precipitation

Bio17 Precipitation of Driest Quarter Precipitation

Bio18 Precipitation of Warmest Quarter Both

Bio19 Precipitation of Coldest Quarter Both

31

32

33 Genetic architecture of water-use efficiency

34 Table S2. Variable contribution (VC) and permutation importance (PI) scores for each species

35 distribution model (SDM) for each bioclimatic variable (Bio). Values are either means or

36 standard deviations (sd) across the 10 replicated runs. KM = Klamath Mountain; SN = southern

37 Sierra Nevada.

KM SDM SN SDM

Bio VC (mean) VC (sd) PI (mean) PI (sd) VC (mean) VC (sd) PI (mean) PI (sd)

Bio1 9.4669 0.9973 69.3646 0.0418 1.0681 0.9973 0.0684 0.0418

Bio2 0.6066 1.8227 0.2639 19.2140 41.3070 1.8227 10.9376 19.2140

Bio3 7.7503 1.0824 2.0402 2.3958 25.4156 1.0824 4.1491 2.3958

Bio4 1.5862 0.0272 9.6481 0.4408 0.1172 0.0272 1.6714 0.4408

Bio5 7.2930 2.8700 0.0000 0.1336 20.8623 2.8700 0.0988 0.1336

Bio6 2.3979 0.0787 1.0080 0.4417 0.0818 0.0787 0.1551 0.4417

Bio7 2.0505 0.1575 0.0000 0.0437 0.1638 0.1575 0.0242 0.0437

Bio8 0.0281 0.1454 0.0000 12.7328 0.2716 0.1454 34.3415 12.7328

Bio9 0.4293 0.2292 0.4839 0.6193 0.3272 0.2292 1.5396 0.6193

Bio10 1.4292 1.1211 0.6715 0.0150 3.1607 1.1211 0.0057 0.0150

Bio11 0.3747 0.1462 0.4255 0.1171 0.1865 0.1462 0.0747 0.1171

Bio12 5.8994 0.1988 0.0410 2.3376 0.2082 0.1988 3.7135 2.3376

Bio13 0.0341 0.0606 0.0240 0.4824 0.0924 0.0606 0.5832 0.4824

Bio14 1.3270 0.5418 6.0128 0.5319 2.5582 0.5418 1.3117 0.5319

Bio15 0.3237 0.4655 0.3469 11.9870 2.8783 0.4655 38.7247 11.9870

Bio16 20.8919 0.1043 1.3043 0.0559 0.0557 0.1043 0.0412 0.0559

Bio17 37.7992 0.0660 6.9994 0.6256 0.1596 0.0660 1.0055 0.6256

Bio18 0.3012 0.1371 1.3658 0.1851 0.1777 0.1371 0.3119 0.1851

Bio19 0.0110 0.2424 0.0000 1.2179 0.9081 0.2424 1.2421 1.2179

38

39 Genetic architecture of water-use efficiency

Table S3. Summary of annotation (InterPro domains) information given by Friedline et al. (2015) for RADtags within 3 cM of the QTL 40 positions in Table 5. Unique annotations are separated by a semicolon. 41

Trait LGa Position RADtagsb P. taedac Putative Putative annotationse

(cM) genesd

15 δ N 1 0 356 119 7 Uncharacterized family FPL; SANT/Myb domain|Myb-like domain; Winged

helix-turn-helix DNA-binding domain|Peptidase M24, structural domain|Peptidase

M24A, methionine aminopeptidase, subfamily 2; Glycoside hydrolase, family 3, N-

terminal|Glycoside hydrolase, superfamily|Glycoside hydrolase family 3 C-

terminal domain; GDP-fucose protein O-fucosyltransferase; SS18

family|Crotonase superfamily; DNA mismatch repair protein MutS, clamp|DNA

mismatch repair protein MutS, C-terminal|DNA mismatch repair protein MutS-like,

N-terminal|DNA mismatch repair protein MutS, core|DNA mismatch repair protein

MutS, connector domain

15 δ N 1 79 35 14 2 Pentatricopeptide repeat; UDP-glucose 4-epimerase GalE|NAD(P)-binding

domain

13 δ C 1 13 144 40 3 Leucine-rich repeat|Leucine-rich repeat, typical subtype|Serine/threonine-/dual

specificity protein kinase, catalytic domain|Concanavalin A-like lectin/glucanase,

subgroup|Protein kinase, ATP binding site|Tyrosine-protein kinase, catalytic

domain|Protein kinase domain|Protein kinase-like domain|Serine-

threonine/tyrosine-protein kinase catalytic domain|Serine/threonine-protein Genetic architecture of water-use efficiency

kinase, active site|Leucine-rich repeat-containing N-terminal, type 2; Zinc finger,

C3HC-like; RWP-RK domain|Phox/Bem1p

13 δ C 1 98 76 24 0 NA

13 δ C 2 66 52 18 2 Class II glutamine amidotransferase domain|Alpha-helical ferredoxin|Glutamate

synthase, central-N|Glutamate synthase, alpha subunit, C-terminal|Aldolase-type

TIM barrel|Glutamine amidotransferase type 2 domain|Glutamate synthase,

central-C|FAD-dependent pyridine nucleotide-disulphide

oxidoreductase|Glutamate synthase, NADH/NADPH, small subunit 1; Protein

kinase, ATP binding site|Serine/threonine-protein kinase, active site|Tyrosine-

protein kinase, catalytic domain|Serine/threonine-/dual specificity protein kinase,

catalytic domain|Concanavalin A-like lectin/glucanase, subgroup|Protein kinase

domain|Protein kinase-like domain

13 δ C 2 77 102 30 6 Leucine-rich repeat, cysteine-containing subtype; Amino acid transporter,

transmembrane; Ribosomal protein L4/L1e|Ribosomal protein L4/L1e, bacterial-

type; NAD(P)-binding domain|Multi antimicrobial extrusion protein;

Tetratricopeptide-like helical; Pseudouridine synthase, catalytic

domain|Pseudouridine synthase I, TruA|Pseudouridine synthase I, TruA, C-

terminal

13 δ C 3 14 113 37 2 maturation protein SBDS|Ribosome maturation protein SBDS,

conserved site|Ribosome maturation protein SBDS, C-terminal|Ribosome Genetic architecture of water-use efficiency

maturation protein SBDS, N-terminal; Leucine-rich repeat

13 δ C 3 34 62 22 3 DNA topoisomerase I|DNA breaking-rejoining enzyme, catalytic core|DNA

topoisomerase I, DNA binding, eukaryotic-type|DNA topoisomerase I, DNA

binding, mixed alpha/beta motif, eukaryotic-type|DNA topoisomerase I, catalytic

core, eukaryotic-type|DNA topoisomerase I, catalytic core, alpha/beta

subdomain|DNA topoisomerase I, active site|DNA topoisomerase I, domain

1|DNA topoisomerase I, eukaryotic-type|DNA topoisomerase I, catalytic core,

alpha-helical subdomain, eukaryotic-type; Haem peroxidase,

plant/fungal/bacterial|Haem peroxidase|Peroxidases heam-ligand binding site;

Iron hydrogenase, large subunit, C-terminal|Iron hydrogenase

15 δ N 3 35 70 25 4 DNA topoisomerase I|DNA breaking-rejoining enzyme, catalytic core|DNA

topoisomerase I, DNA binding, eukaryotic-type|DNA topoisomerase I, DNA

binding, mixed alpha/beta motif, eukaryotic-type|DNA topoisomerase I, catalytic

core, eukaryotic-type|DNA topoisomerase I, catalytic core, alpha/beta

subdomain|DNA topoisomerase I, active site|DNA topoisomerase I, domain

1|DNA topoisomerase I, eukaryotic-type|DNA topoisomerase I, catalytic core,

alpha-helical subdomain, eukaryotic-type; Haem peroxidase,

plant/fungal/bacterial|Haem peroxidase|Peroxidases heam-ligand binding site;

Iron hydrogenase, large subunit, C-terminal|Iron hydrogenase; Mediator complex,

subunit Med14 Genetic architecture of water-use efficiency

15 δ N 3 52 101 29 6 RNA polymerase II, heptapeptide repeat, eukaryotic|RNA polymerase Rpb1,

domain 5; PUB domain|Zinc finger, C2H2|UBX|PUG domain; Aldehyde

dehydrogenase domain|Aldehyde dehydrogenase NAD(P)-

dependent|Aldehyde/histidinol dehydrogenase|Aldehyde dehydrogenase, C-

terminal; RNA recognition motif domain|NAD(P)-binding domain|Nucleotide-

binding, alpha-beta plait; Pentatricopeptide repeat|Tetratricopeptide-like helical;

Cation/H+ exchanger|Cation/H+ exchanger

13 δ C 5 64 42 13 2 Molybdopterin biosynthesis MoaE|Molybdopterin biosynthesis

MoaE|Molybdopterin biosynthesis MoaE|Molybdopterin biosynthesis MoaE;

Lipase, class 3

13 δ C 6 46 91 30 4 Concanavalin A-like lectin/glucanases superfamily|Serine/threonine-/dual

specificity protein kinase, catalytic domain|Legume lectin, beta chain, Mn/Ca-

binding site|Legume lectin domain|Concanavalin A-like lectin/glucanase,

subgroup|Tyrosine-protein kinase, catalytic domain|Protein kinase domain|Protein

kinase-like domain|Serine/threonine-protein kinase, active site; PAR1;

Transcription factor CBF/NF-Y/archaeal histone|Histone-fold; G10

protein|BUD31/G10-related, conserved site

13 δ C 6 56 54 22 2 EF-hand domain pair|Endonuclease/exonuclease/phosphatase|EF-Hand 1,

calcium-binding site; Ubiquitin-like 1 activating enzyme, catalytic cysteine

domain|NAD(P)-binding domain|Molybdenum cofactor biosynthesis, MoeB Genetic architecture of water-use efficiency

15 δ N 7 62 87 34 3 HAD-like domain; Cytochrome P450, E-class, group I|Cytochrome P450,

conserved site|Cytochrome P450; Protein kinase, ATP binding

site|Serine/threonine-protein kinase, active site|Tyrosine-protein kinase, catalytic

domain|Serine/threonine-/dual specificity protein kinase, catalytic

domain|Concanavalin A-like lectin/glucanase, subgroup|Protein kinase

domain|Protein kinase-like domain

15 δ N 7 80 68 23 3 Protein kinase domain|Protein kinase-like domain|Serine-threonine/tyrosine-

protein kinase catalytic domain; Zinc finger, RING-type|Zinc finger, C3HC4 RING-

type|Zinc finger, RING/FYVE/PHD-type; Glycoside hydrolase, family 85

15 δ N 8 68 57 22 0 NA

15 δ N 8 71 80 28 0 NA

15 δ N 9 64 74 29 2 Ribosomal protein L38e|Cytochrome P450; Thiolase-like|Thiolase, N-

terminal|Thiolase

15 δ N 9 95 238 81 9 Citrate synthase, type II|Citrate synthase active site|Citrate synthase-like|Citrate

synthase-like, large alpha subdomain|Citrate synthase-like, small alpha

subdomain|Citrate synthase-like, core; G-patch domain; Pollen Ole e 1

allergen/extensin; Protein kinase, ATP binding site|Serine/threonine-protein

kinase, active site|Tyrosine-protein kinase, catalytic domain|Serine/threonine-

/dual specificity protein kinase, catalytic domain|Protein kinase domain|Protein

kinase-like domain; ARID/BRIGHT DNA-binding domain|High mobility group box Genetic architecture of water-use efficiency

domain; Homeodomain-like|SANT/Myb domain|Myb domain; Lipase, class 3;

NAD(P)-binding domain; RNA recognition motif domain|Nucleotide-binding,

alpha-beta plait|Nuclear transport factor 2|Nuclear transport factor 2, eukaryote

13 δ C 12 23 172 58 4 Protein of unknown function DUF247, plant; Myc-type, basic helix-loop-helix

(bHLH) domain; UDP-glucuronosyl/UDP-glucosyltransferase|UDP-

glucuronosyl/UDP-glucosyltransferase; Serine/threonine-protein kinase, active

site|Tyrosine-protein kinase, catalytic domain|Serine/threonine-/dual specificity

protein kinase, catalytic domain|Protein kinase domain|Protein kinase-like domain

13 δ C 12 43 28 6 2 Alpha/beta hydrolase fold-1; ATPase, AAA-type, core|DNA polymerase III,

gamma subunit, domain III|DNA polymerase III, subunit gamma/ tau Genetic architecture of water-use efficiency

Supplemental Figures

Figure S1. The geographical distribution for foxtail pine as defined by Little (1971). The GIS shapefile used to create this figures was obtained from the USGS Geosciences and Environmental Change Science Center (http://esp.cr.usgs.gov/data/little/).

Supplement: 10 Genetic architecture of water-use efficiency

Figure S2. Number of SNPs with quality > 20 as a function of the percent of samples missing a genotype call for that SNP.

Supplement: 11 Genetic architecture of water-use efficiency

Figure S3. Variant calling statistics for n = 11943 bialleleic SNPs called for n = 182 individuals across five familes. QUAL is the Phred-scaled quality score for calling the alternate allele (filtered with –minQ 20), MQ is the RMS mapping quality, and DP is the combined depth across samples.

Supplement: 12 Genetic architecture of water-use efficiency

Figure S4. Mean receiver operating characteristic (ROC) curves for each species distribution model (SDM) for each regional population (top = Klamath Mountains, bottom = southern Sierra Nevada).

Supplement: 13 Genetic architecture of water-use efficiency

Figure S5. Differences between the ecological niches of regional populations of foxtail pine are revealed using a biplot from a principal components analysis (PCA) based on 19 bioclimatic variables (BC) extracted from GIS layers for the 209 (nKlamath Mountains = 65, nsouthern Sierra Nevada = 144) observed occurrences and 10,000 randomly sampled background points (n = 5,000 points/regional population). Illustrated are the first two principal components (PCs), which explained 84.26% of the variance. Observed occurrences for foxtail pine in the Klamath Mountains are shown as filled blue circles, whereas observed occurrences in the southern Sierra Nevada are shown as filled red circles. Average values for the first two PCs are plotted as triangles for each regional population (blue = Klamath Mountains, red = southern Sierra Nevada). Background points for the Klamath Mountains are given as filled salmon circles, whereas background points for the southern Sierra Nevada are given as filled green circles. Vectors for each bioclimatic variable are colored by whether it is related to temperature (orange), precipitation (blue), or both temperature and precipitation (purple). Shown above and to the right of the biplot are loadings for 19 bioclimatic variables on the first (left to right: BC1 to BC19) and second (top to bottom: BC1 to BC19) PCs, respectively. Colors in the bar plots have the same meaning as colors for the vectors in the biplot.

Supplement: 14 Genetic architecture of water-use efficiency

Figure S6. Regional means for 19 bioclimatic variables (BC) based on occurrences of foxtail pine illustrate differences between climates of the Klamath Mountains and southern Sierra Nevada. Values were centered and standardized across the entire dataset (occurrences plus background points), thus a value of zero is the global mean from which values deviate in units of global standard deviations. Values plotted on the left for each bioclimatic variable (lighter colors) are for the Klamath Mountains, while values on the right (darker colors) are for the southern Sierra Nevada. Vertical lines give the 95% confidence interval for the mean (± 1.96 × standard error of the mean). Bioclimatic variables are colored based on whether they are temperature- related (orange, brown), precipitation-related (blue, dark blue), or related to both temperature and precipitation (purple, dark purple).

Supplement: 15 Genetic architecture of water-use efficiency

Figure S7. Correlations of bioclimatic variables (bio) for localities of foxtail pine in the Klamath Mountains (upper diagonal) and the southern Sierra Nevada (lower diagonal) range from strongly negative to strongly positive. Values within cells are Pearson correlation coefficients (r) based on bioclimatic variables extracted from WorldClim GIS layers for the localities where foxtail pine was known to occur in each regional population (n = 65 for Klamath Mountains, n = 144 for the southern Sierra Nevada). Cell colors are proportional to the value of r, with dark red being r = -1.0 and white being r = 1.0.

Supplement: 16 Genetic architecture of water-use efficiency

Figure S8. The relationship between permutation importance (PI) scores for bioclimatic variables (Bio) derived from each species distribution model (SDM). Values are not significantly correlated between SDMs at α = 0.05 (Spearman’s ρ = -0.089, P = 0.7171). For symbols without apparent error bars, the diameter of the circle was greater than the standard deviation. Values > 10 are labeled for each SDM.

Supplement: 17 Genetic architecture of water-use efficiency

Figure S9. Jackknife estimates of variable importance based on AUC for bioclimatic variables derived from the species distribution model (SDM) for the Klamath Mountains population of foxtail pine.

Supplement: 18 Genetic architecture of water-use efficiency

Figure S10. Jackknife estimates of variable importance based on test gain for bioclimatic variables (bio) derived from the species distribution model (SDM) for the Klamath Mountains population of foxtail pine.

Supplement: 19 Genetic architecture of water-use efficiency

Figure S11. Jackknife estimates of variable importance based on regularized test gain for bioclimatic variables (bio) derived from the species distribution model (SDM) for the Klamath Mountains population of foxtail pine.

Supplement: 20 Genetic architecture of water-use efficiency

Figure S12. Jackknife estimates of variable importance based on AUC for bioclimatic variables derived from the species distribution model (SDM) for the southern Sierra Nevada population of foxtail pine.

Supplement: 21 Genetic architecture of water-use efficiency

Figure S13. Jackknife estimates of variable importance based on test gain for bioclimatic variables derived from the species distribution model (SDM) for the southern Sierra Nevada population of foxtail pine.

Supplement: 22 Genetic architecture of water-use efficiency

Figure S14. Jackknife estimates of variable importance based on regularized test gain for bioclimatic variables derived from the species distribution model (SDM) for the southern Sierra Nevada population of foxtail pine.

Supplement: 23 Genetic architecture of water-use efficiency

Figure S15. Summaries of niche divergence for foxtail pine reveal statistically significant divergence. (A) Observed localities of each regional population (red = southern Sierra Nevada; blue = Klamath Mountains) relative to 5,000 background points (green = southern Sierra Nevada; salmon = Klamath Mountains) for two bioclimatic variables (Bio18 on x-axis and Bio19 on y-axis). (B) Null distributions (n = 100 permutations) for the D and I statistics relative to the observed values (red lines) reveal that the null hypothesis that the two SDMs are no more differentiated than those randomly drawn from a common SDM with non-overlapping geographical distributions for each regional population is not well supported. (C – F) Null distributions (n = 100 permutations) of the D (panels C and D) and I (panels E and F) statistics relative to observed values (red lines) for the southern Sierra Nevada relative to the Klamath Mountains background (panels C and E) and the Klamath Mountains relative to the southern Sierra Nevada (panels D and F) reveal that observed values are not predicted well by the null hypothesis.

Supplement: 24 Genetic architecture of water-use efficiency

Figure S16. Autocorrelation functions (ACF = Pearson’s r) at various lags reveal moderate levels of spatial autocorrelation for the F-statistic used to test for the presence of QTLs along linkage groups for each trait. (A) Autocorrelation of the F-statistic for δ13C. (B) Autocorrelation of the F-statistic for δ15N.

Supplement: 25 Genetic architecture of water-use efficiency

Figure S17. The relationship between F-statistics measured for each trait at a resolution of 1- cM window reveals little to no correlation.

Supplement: 26 Genetic architecture of water-use efficiency

Figure S18. The negative relationship between the correlation in family effects from the two- locus QTL models, as assessed using the t-statistic for each family at each QTL on the same linkage group, and the difference in position of the QTLs on the same linkage group

Supplement: 27 Genetic architecture of water-use efficiency

Figure S19. Average estimates of h2 for δ13C in sugar pine are over-estimated using a small number of families (n). Averages are arithmetic means across 100 randomly selected datasets for each value of n. The horizontal red line is the estimate reported in Eckert et al. (2015).

Supplement: 28