ABSTRACT

A CONSENSUS MODEL FOR PREDICTING THE DISTRIBUTION OF THE THREATENED TELEPHUS SPURGE ( TELEPHIOIDES)

by Jason Thomas Bracken

A consensus species distribution model was created to predict the presence of the federally threatened plant Telephus spurge in three counties in the panhandle. Final results were created by averaging results from a top generalized linear model created in this study with those of a top machine-learning model developed by a fellow researcher prior to this study. These two initial models began with the same data sets and followed the same procedure for spatial filtering, but then selected individual methodology relating to their specific model types and statistical concerns. Final predictions showed several areas of interest along the coast and around St. Andrew’s Bay, among others. The highest probability for Telephus spurge presence was located on Tyndall Air Force Base. Without researcher knowledge, Telephus spurge was actually discovered on Tyndall Air Force Base in one of these areas of high probability before this study was completed. This discovery is the first in that area and is distinct from previously known populations of Telephus spurge and represents a minor example of model validation through incidental ground-truthing. It is hoped that this study can continue to predict the location of new populations or preferred habitat to help in conservation efforts.

A CONSENSUS MODEL FOR PREDICTING THE DISTRIBUTION OF THE THREATENED PLANT TELEPHUS SPURGE (EUPHORBIA TELEPHIOIDES)

A Thesis

Submitted to the

Faculty of Miami University

in partial fulfillment of

the requirements for the degree of

Master of Environmental Science Institute for the Environment and Sustainability

by

Jason Thomas Bracken

Miami University

Oxford, Ohio

2016

Advisor: Dr. Jing Zhang

Reader: Robbyn Abbitt

Reader: Dr. Sarah Dumyahn

©2016 Jason Thomas Bracken

This Thesis titled

A CONSENSUS MODEL FOR PREDICTING THE DISTRIBUTION OF THE THREATENED PLANT TELEPHUS SPURGE (EUPHORBIA TELEPHIOIDES)

by

Jason Thomas Bracken

has been approved for publication by

The College of Arts and Science

and

Institute for the Environment and Sustainability

______Dr. Jing Zhang

______Robbyn Abbitt

______Dr. Sarah Dumyahn

Table of Contents List of Tables ...... iv List of Figures ...... v Acknowledgements ...... vii INTRODUCTION: ...... 1 Telephus Spurge ...... 1 Species Distribution Modeling ...... 2 METHODS: ...... 3 Study Area ...... 4 Data Set Design ...... 5 Variables ...... 7 Model Selection ...... 9 Prediction ...... 10 Comparison ...... 11 Consensus ...... 11 Results: ...... 12 Model Selection ...... 12 Prediction ...... 14 Comparison ...... 16 Consensus ...... 20 Discussion: ...... 22 Conclusions...... 22 Limitations and Areas for Improvement ...... 23 Literature Cited: ...... 26 APPENDIX A – TABLE OF ALL VARIABLES CONSIDERED IN STUDY ...... 30 APPENDIX B – LIST OF ALL LAND USE/LAND COVER TYPES IN STUDY AREA ... 33 APPENDIX C – LIST OF ALL SOIL TYPES IN STUDY AREA ...... 37 APPENDIX D – MAP OF ROCK TYPES IN STUDY AREA ...... 41 APPENDIX E – MAPS OF PRECIP. AND TEMP. IN STUDY AREA ...... 42 APPENDIX F – MAPS OF SPECTRAL VEGETATIVE INDICES IN STUDY AREA ..... 44 APPENDIX G - MAPS OF ELEVATION AND SLOPE ...... 49 APPENDIX H – TABLE OF AUC RESULTS FOR REPLICATE GLM MODELS ...... 50

iii

List of Tables Table 1. Summary of variable selection results ...... 12

Table 2. Summary of model accuracy results ...... 13

Table 3. Table of GLM Weighted Model coefficients ...... 14

Table 4. Rank of variable importance for top models ...... 17

Table 5. List of all variables considered in study ...... 30

Table 6. List of all land use/land cover types in study area ...... 33

Table 7. List of all soil types in study area...... 37

Table 8. Table of AUC results for replicate models using training data ...... 50

Table 9. Table of AUC results for replicate models using partial training data ...... 51

iv

List of Figures Figure 1. Methodological flowchart ...... 3

Figure 2. Map of study area and presences points ...... 4

Figure 3. Diagram of spatial filtering process for presence standardization ...... 5

Figure 4. Map displaying typical pseudo-absence distribution ...... 7

Figure 5. Map of GLM Mean Weighted model ...... 15

Figure 6. Comparison of GLM Weighted minimum, mean, and maximum models ...... 16

Figure 7. Comparison of GLM Mean Weighted model and BRT 70 Out model ...... 18

Figure 8. Map of general presence/absence agreement between models ...... 19

Figure 9. Map of probability differences between models ...... 19

Figure 10. Map of Consensus model ...... 20

Figure 11. Map of Consensus model with generalized probability...... 21

Figure 12. Methodological flowchart with final determinations ...... 22

Figure 13. Map of Consensus model with new location of Telephus spurge ...... 25

Figure 14. Map of rock types in study area ...... 41

Figure 15. Maps of average precipitation in February, May, and August ...... 42

Figure 16. Maps of average temperature in February, May, and August ...... 43

Figure 17. Maps of NDVI for February, May, and August ...... 44

Figure 18. Maps of EVI for February, May, and August ...... 45

Figure 19. Maps of TCTB for February, May, and August ...... 46

Figure 20. Maps of TCTG for February, May, and August ...... 47

Figure 21. Maps of TCTW for February, May, and August ...... 48

Figure 22. Map of elevation ...... 49

v

Figure 23. Map of percent slope ...... 49

vi

Acknowledgements

I would like to thank my adviser, Dr. Jing Zhang and committee members Robbyn Abbitt, and Dr. Sarah Dumyahn for their direction, expertise, and encouragement. I would like to thank the U.S. Fish & Wildlife Service and the Panama City field office staff including Dr. Vivian Negron Ortiz, Lydia Ambrose, Gayle Martin, Dr. Adam Kaeser, Mary Mittiga, Sandy Pursifull, and others for the opportunities and support they provided me. I also wish to thank Miami University and the Institute for the Environment and Sustainability, particularly Dr. Thomas Crist, Suzanne Zazycki, and Denise Withrow for their guidance. I would also like to thank fellow researcher Alexa Mainella for her work on Telephus spurge which contributed to the consensus model developed in this study.

vii

INTRODUCTION:

Telephus spurge (Euphorbia telephioides) is a federally threatened species of plant endemic to only three counties in the Florida panhandle (USFWS 2014, USFWS 1992). There are 41 known sites, approximately nine of which appear to have been extirpated by development (USFWS 2014). Approximately three-fourths of known sites exist on private lands and are not afforded government protection and management (USFWS 2014). Current populations are fragmented and threatened by habitat loss, fire suppression, and the prospect of sea level rise among others (USFWS 2014, USFWS 1992). Discovery of new populations of Telephus spurge could greatly improve efforts to preserve the species by providing protection and management of the new populations and increasing knowledge of its distribution and preferences. Species distribution modeling (SDM) is the process of predicting species occurrence or abundance based on statistical associations of the species with environmental variables (Franklin 2013). Most often, observations of the species are spatially linked to environmental predictors using digital maps and applying various algorithms or statistical learning methods (Franklin 2013). The associations derived from these methods can then be reapplied to the environmental predictors for different areas, conditions, or times to suggest where and in what abundance the species might exist. The results of the modeling are often rendered into digital maps showing probabilities of presence or predicted abundance (Franklin 2013). Applications of SDMs include ecology, biogeography, biodiversity assessment, conservation biology, wildlife management, and conservation planning (Franklin 2013, Peterson and Soberon 2012, Elith et al. 2006, Marmion et al. 2009). Examples of model outcomes might include determining significant factors to species presence, locating unknown populations, or predicting the effects of climate change on species distribution (Franklin 2013, Williams et al. 2009). Though specific methodology, parameters, and results are highly variable, SDMs on the whole are well established and approximately a thousand papers a year are published using such methods (Peterson and Soberon 2006). In this study I created SDMs for Telephus spurge based on a platform of Generalized Linear Models (GLMs) in an attempt to help discover new populations and to understand the potential distribution of Telephus spurge. I focused on reducing sample bias, spatial autocorrelation, multicollinearity, and overfitting to address concerns stemming from species attributes, study area, and weaknesses in available data. Additionally, I went on to compare my top model with the top model created by a fellow researcher using similar data with machine learning methods. I then combined our models into a consensus model in an attempt to further strengthen our results and conclusions. Telephus Spurge

Telephus spurge is a small green perennial herbaceous plant reaching a height of 25cm (Bridges and Orzell 2002). The plant has numerous stems, tuberous roots, small cup-like flowers, and three-lobed capsule like fruits (USFWS 2015, Trapnell et al. 2012). It has only been found in Bay, Franklin, and Gulf counties within 7km of the coast along the Florida panhandle. There have been 41 known populations, but many consist of small numbers in localized areas spread

1 among only three more general regions (USFWS 2014). It is often associated with wire grass, longleaf and slash pines in xeric to mesic pine flatwoods and scrubby pinelands, and prefers disturbed sandy soils and frequent fire (USFWS 2014, Trapnell et al.2012). With its limited height disturbance may allow it reprieve from being shaded out by faster growing palmetto and titi (Bridges and Orzell 2002). Observations suggest Telephus spurge may survive more difficult and competitive conditions by existing ephemerally as a tuberous root below ground until conditions improve after fire and other disturbance (Trapnell et al. 2012, Negron-Ortiz 2014 pers. obs.). The plant can be found with male, female, subdioecious (both sexes), and monoecious (non-binary) cyathia and may change sex in response to disturbance (Trapnell et al., Negron-Ortiz 2014 pers. obs.). Telephus spurge was listed as threatened by the U.S. Fish and Wildlife Service in 1992 (USFWS 1992). It has a limited distribution and population. The most immediate threat is human development, while climate change and subsequent sea level rise represent a long term threat that appears imminent (USFWS 2014). No population exists more than 3 meters above sea level and the species does not seem to readily disperse over large distances that might allow for species migration (Negron-Ortiz pers. obs.). Attempts at transplantation have so far not been particularly successful and may not provide a suitable conservation strategy (Ecological Resource Consultants 2006, Negron-Ortiz pers. obs. 2014). Species Distribution Modeling

SDMs are numeric tools used to evaluate occurrence or abundance of a species in light of their associated environmental conditions to predict the likelihood of additional occurrence or abundance, sometimes over varying spatial and temporal scenarios (Franklin 2013, Elith and Leatherwick 2009). They are often implemented as cost effective means for informing conservation and wildlife management decisions (Elith and Leatherwick 2009). There are many kinds of SDMs. Current classifications include regression techniques, neural networks, ordination and classification methods, Bayesian models, locally weighted approaches, environmental envelope, and hybrids (Guisan and Zimmerman 2000). Different kinds of SDMs are associated with different types of data, processing, prediction, and performance over different conditions. GLMs are an extension of linear regression models that are made more flexible to allow for non- linear and non-constant variance data (Guisan et al. 2002). GLMs have become a popular tool in ecological applications and have seen extensive use in SDMs because of their strong statistical foundation and realistic modeling (Guisan and Zimmerman 2000, Guisan et al. 2002, Elith et al. 2006). Newer and more complex methods such as machine-learning have become more popular recently and some studies suggest that while GLMs are competitive, they are frequently outperformed by such novel methods (Williams et al. 2009, Elith et al. 2006). Other studies have suggested that seemingly superior performance in machine-learning methods might come from the limitations of in-sample testing and that such complexity may lead to overfitting and a lack of transferability not seen in more traditional methods such as GLMs (Wenger and Olden 2012).

2

In contrast to the GLMs created in this study, fellow researcher Alexa Mainella created a suite of machine learning models using the same original data. As consensus models have shown to be more robust than individual models (Marmion et al. 2009) and GLMs tend to have opposing strengths and weaknesses to machine-learning methods, a consensus model of these two approaches might produce more balanced results.

METHODS:

The methodology for this study is complex. In order to maintain a sense of each step’s place within the methodology, two flow charts were generated. Figure 1, below, outlines the steps followed in this study. It shows decision points and subsequent actions. Figure 13, shown on page 22 at the end of the Results section, shows the same flowchart outline but with the final determinations applied to each decision point seen in Figure 1.

Figure 1. A flowchart of the methodological approach used in this study. Boxes in white show steps in which a decision must be made. Boxes in green show the necessary subsequent action.

3

Study Area

The study area is limited to the three Florida panhandle counties currently known to hold Telephus spurge populations: Bay, Gulf, and Franklin (Figure 2). The study area was further reduced to decrease spatial autocorrelation and computational burden by limiting the extent of the counties inland to approximately 30km (Boria et al. 2014). This distance was selected to make sure that the study area encompassed all of the Grand Lagoon and brackish tributaries because of Telephus spurge’s association with the coast and possible marine influence. This is based on the history of Telephus spurge only being observed within 7km of the gulf coast. Limiting the study area to 30km of the coast, as compared to the full extent of the three counties, reduces the area considered in this study by approximately 9%. Because of Bay County’s northern extending panhandle, limiting the study area removed some areas that would have been as far as 59km from the coast, and so an additional 29km from the new study area boundary.

Figure 2. Location of Telephus spurge populations, study area, and boundaries of Bay, Gulf, and Franklin counties in Florida.

4

Data Set Design

Presence data were collected using GPS over several decades time by various researchers within the U.S. Fish & Wildlife Service and Florida Natural Areas Inventory (FNAI). There was not a consistent protocol for data collection over the course of surveying and presences were indicated variably by polygons, points representing individual , and points representing the general location of multiple plants. Some field observers seem to have recorded the presence of Telephus spurge by outlining the area of observation creating a polygon. Some observers recorded each individual Telephus spurge plant as a single point. Others recorded the general presence of Telephus spurge as a point representing the approximate center of their observations. These different methods did not lead to consistent representation of plant presence or density and so required a means of standardizing their findings before proceeding.

Standardizing Presence Data In order to standardize the data for processing, as well as to reduce spatial autocorrelation and sampling bias, the data was filtered using a grid that replaces presence data within a grid cell to a single centroid point (Williams et al. 2009, Boria et al. 2014)(Figure 3). These centroids would then be the proxy presence points in which environmental data would be drawn. The grid cells were made to be 70.72m X 70.72m to ensure that the proxy presence point would never be more than 50m from the location of the original data.

Figure 3. Representation of how grid cells were used to generate centroids as proxy presence points for the original location data. Red centroids represent the standardized presences. 5

Generating Pseudo-absences The GLM used in this study is a logistic regression model requiring presence/absence data. This means that it requires data not only on where the selected species has been found, but also where it has failed to be found. Under perfect circumstances, a species might be said to be confidently absent after repeated thorough surveys of an area (Biodiversity and Climate Change Virtual Laboratory 2016). Unfortunately, species in need of protection, by their general nature, tend to be rare and so difficult to observe. As well, surveying resources may be limited such that sufficient repeated passes of an area are not possible (Biodiversity and Climate Change Virtual Laboratory 2016). In these situations, true absence data are not available. Telephus spurge is just such an example. In order to create presence/absence models without true absence data, pseudo- absences are generated. Pseudo-absences are artificial absences used to provide the background data necessary for such models. There are many different ways to generate pseudo-absences, and method selection is incredibly important to a model’s function and accuracy (Barbet-Massin et al. 2012, VanDerWal et al. 2009). In this study, I sought first to insure convergence of the model which required a minimum number of pseudo-absences. Convergence is simply the ability of the model program to run successfully and generate the estimated coefficients. Failure to converge is often associated with data limitations that produce infinitely positive or negative coefficients because a predictor variable can perfectly predict the presence or absence of the response variable (Institute for Digital Research and Education 2016). In these cases, the coefficients can’t actually be calculated and so the model does not converge. After insuring convergence, I then sought to decrease spatial autocorrelation without significantly reducing sensitivity. Spatial autocorrelation is the condition in which presence data contains associated features merely because of proximity rather than causation, leading to over-fitting of the model (Barbet-Massin et al. 2012). Sensitivity is the ability of the model to minimize true presences recorded as absences (Barbet-Massin et al. 2012). Barbet-Massin et al. in 2012 suggested 10,000 pseudo-absences or 10 replicates of 1000 pseudo-absences for GLMs. I found too many pseudo-absences tended to cause computational difficulties and there was concern over drowning out signal from the presences. However too few pseudo-absences tended to create convergence issues and absence of categorical variable types in the model that were present elsewhere in the study area were also problematic. Using a 1 to 3 ratio of presences to pseudo-absences (534:1,602) with 10 replicate data sets seemed to address these difficulties while still generally holding to the suggestions of Barbet-Massin et al.. I did not however attempt to follow Barbet-Massin et al.’s suggestion with regard to pseudo- absence location. In an attempt to decrease spatial autocorrelation and sampling bias, I generated half of the pseudo-absences (801) randomly within a 10km buffer of the presence locations, while the remaining half were generated randomly throughout the entire study area. Figure 4 shows a typical data set after generating pseudo-absences from these methods.

6

Figure 4. An example from one of the data sets of presences and typical pseudo-absence distribution

Variables

Prior to variable elimination through processes of reducing multicollinearity, as well as model selection, 29 variables were considered for inclusion. The data for these variables were extracted from layer files using ArcGIS 10.3 and exported into an excel file. The file data came in raster and vector formats from a number of sources and using a number of different methods. Because of the importance of the environmental variables and their differing methodology, each is outlined below, broken up by category. For further detail see Table 5 in Appendix A. Land Use/Land Cover The land use/land cover (LULC) data considered for the model was a geodatabase feature class file from version 3.0 of the Cooperative Land Cover Map created by the Florida Natural Areas Inventory (FNAI) from 2014. It was a fine scaled data set with 10m resolution. As mentioned, Telephus spurge has been associated with a number of land cover types, particularly varieties of pine flatwoods, making an LULC map an obvious choice for consideration in the model (Bridges and Orzell 2002, Trapnell et al. 2012, USFWS 2014). LULC is one of three categorical variables included in model consideration along with soils and surficial geology. See Appendix B for list of all LULC types found in the study area listed by acreage.

7

Soils The soil data considered was gathered from shapefiles downloaded from the Soil Survey Geographic Database (SSURGO) from 2014 provided by the Natural Resources Conservation Service (NRCS). Resolution was 12m. Sandy soils have previously been associated with Telephus spurge, and more generally, soils have long been linked to plant success and distribution (USFWS 2014, Trapnell et al.2012, Kruckeberg 1969). See Appendix C for a full list of Soil types found in the study area listed by acreage. Surficial Geology Bedrock, or surficial geology, has also been tied to plant success and distribution through its influence on soil, hydrology, and surface conditions (Searcy et al. 2003, Strahler 1978). Surficial geology data was downloaded as shapefiles from the 2001 Florida Geologic Survey provided by the Florida Department of Environmental Protection. Resolution was 50m. The surficial geology variable is listed as “Rock Type” in the data sets. See Appendix D for a map of the rock types present in the study area. Climate Temperature and precipitation were considered for the model. While macro level climate will often be homogenous over study areas of this size, coastal and temporal influences were thought to provide some distinction that could be of importance. A 930m resolution dataset was downloaded as raster files from worldclim.org for both the average temperature and precipitation in the months of February, May, and August over the time span of 1960-1990. See Appendix E for temperature and precipitation maps of the study area over the months listed. Vegetative Indices Satellite imagery can be used to measure vegetation biomass via spectral analysis (Campbell and Wynne 2011). Different bands of wave lengths of reflected light can be analyzed from imagery to create Spectral Vegetative Indices (SVI) that speak to the amount and type of vegetation present (Campbell and Wynne 2011). For example, green vegetation absorbs red light (R) and reflects near infrared (NIR) radiation making it possible to contrast the amounts of each to determine photosynthetic activity in various ways (Campbell and Wynne 2011). The SVIs considered for the model were the normalized difference vegetation index (NDVI), enhanced vegetative index (EVI), and three types of tasseled cap transformations (TCTs) that represent brightness, greenness, and wetness. For each index, the months of February, May, and August were evaluated as was done with climate data. The indices were derived from 30m resolution Landsat 8 OLI 2014 satellite imagery downloaded from the United States Geological Service’s (USGS) EarthExplorer. We calibrated and mosaicked the images using ENVI 5.0 and then calculated the indices using ENVI Classic software. See Appendix F for maps of the SVIs calculated for the study area. Topography

8

Slope and elevation were considered for the model. The variables were derived from a 10m resolution digital elevation model (DEM) from the National Elevation Dataset (NED) provided by the USGS. Bilinear interpolation was used for the elevation, while slope percentage was calculated from the maximum difference between the original cell and any one of its adjacent neighbor cells (ESRI 2007). Elevation and slope maps of the study area are provided in Appendix G. Distance Variables The proximity to several features: wetlands, roads, and the ocean, were thought to have the potential to be important predictors. The distance to wetlands generally speaks to moisture content, hydrology, and potential for unknown associations of flora and fauna. While distance to road is likely to be prone to observation bias, it was considered because of the strong association of Telephus spurge with disturbance, particularly road side mowing and right-of-ways (Bridges and Orzell 2002, Negron-Ortiz 2014 per. obs.). Distance to ocean was considered because Telephus spurge has so far been restricted to within 7km of the coast (USFWS 2014). These variables were calculated using the “near” tool in ArcMap 10.3, which provided the shortest distance from the original presence or pseudo-absence point to the selected features. The wetland features were selected from the LULC data categories with applicable definitions based on metadata. Variable Reduction and Standardization In order to avoid over or underweighting variables containing different ranges of data, z-score standardization was implemented on all non-categorical variables (See Appendix A for list of categorical and non-categorical variables). Z-score standardization was achieved by taking the original variable value, subtracting the variable mean, and then dividing by the standard deviation of the variable. This gives each variable a mean of zero, and a variance of one (Milligan and Cooper 1988) and helps to prevent undue significance of some variables over others. The standardization was applied to both the model input data, as well as the full study area data used to predict from the model. The mean and standard deviation referenced for standardization was derived from the entire study area. In order to reduce multicollinearity, which is the condition of having independent variables highly correlated with each other, I removed any non-categorical variable with a variance inflation factor (VIF) higher than 5 by using the vif function in R (Allen 1997). Multicollinearity is problematic because it undermines the significance of variables in multiple regression by inflating standard error (Allen 1997). When two or more of the environmental variables are highly correlated with each other it will be difficult to assess the effect on the probability of Telephus spurge presence. VIF is a measure of the multicollinearity of variables and allows us to remove those that would greatly increase the error in our model estimates. The variables that were removed and those that remained for model consideration can be seen in Table 1 in the Results section. Model Selection

9

The remaining variables with VIF less than 5 were entered into the glm2 function in the statistical computing software R Studio. The glm2 package was used in place of the standard glm package to allow for more stable convergence (Marschner 2011). The variables were only entered into glm2 additively, as interactions produced singularity, convergence, and computational problems throughout the data sets. The glm2 link function designation was binomial. The final models for each data set were chosen by lowest Bayesian information criterion (BIC) score. BIC is one of the most popular and effective means for model selection (Cavanaugh and Neath 1999). Minimizing BIC maximizes posterior model probability while automatically penalizing model complexity and reducing overfitting (Cavanaugh 2009, Kass and Raferty 1995, Wit et al. 2012). Additionally two sets of these models were constructed, one with pseudo-absences weighted equally to presences and one with pseudo-absences weighted 1/3rd of presences to account for pseudo-absences outnumbering presences 3 to 1. The models were ultimately compared using area under the curve (AUC) of the receiver operating characteristic (ROC), hereby referenced as simply AUC. While BIC is a relative measure of model strength that can attempt to select the best of possible models given the parameters, it does not measure model accuracy in itself. AUC is a measure of model accuracy which considers predictive ability of the model via sensitivity and specificity. Sensitivity is the percentage of true presences predicted to be presences and specificity is the percentage of true absences predicted to be absences. When the model provides a probability greater than 50% it was considered a prediction of presence (threshold = .5). In the case at hand, in which no true absence data was available, randomly selected background data can be used and the interpretation and understanding should shift to incorporate it (Phillips et al. 2006, Phillips and Dudik 2008). That is to say, AUC in this case is a measure of the prediction of presences over background data and is still a useful metric for evaluating the predictive power of a model (Phillips and Dudik 2008). In this instance, models were evaluated against their own training data and also with the known training presences against random background data not included in their training data. In the latter case, the background pseudo-absences were in equal proportion to the known presences (534 each). Cross validation through partitioning of the known presences into testing and training sets was rejected as reduction in observations lead to convergence problems and difficulties with full representation of categorical variables. Cross validation is a preferred evaluation method as it considers only unseen data, rather than some or all of the data the model was trained on. The absence of full cross validation metrics is a limitation to the model evaluation. Prediction

A predict function in R Studio was used to generate probabilities for Telephus spurge occurrence using the selected models and an excel file of extracted and standardized variable data for the entire study area. This was done only for the weighted models as they were selected over the unweighted models based on their higher AUC scores (Table 2, in Results). The probabilities were exported into another excel file corresponding to the coordinates of centroids comprising the study area. The coordinates were linked to their corresponding probabilities and the updated excel file was then imported into ArcGIS for display. This was done for each of the ten weighted 10 models. As well, the prediction data from the ten weighted models were combined to produce a minimum, maximum, and mean model (Figure 6, in Results). The minimum and maximum models selected the smallest and the largest probability values of the ten models respectively for each centroid of the study area, and was used as a rough measure of model variability. The mean model was the final selected model as it should address randomness within data sets (Barbet- Massin et al. 2012) and so is preferred over any of the ten individual models or the minimum or maximum models. Comparison

Simultaneous to my model creation, fellow researcher in the Institute for the Environment and Sustainability (IES), Alexa Mainella, was also creating models for the presence of Telephus spurge (Mainella 2016). We began with the same raw data, and standardized our presence and variable data in similar fashion, but then diverged on model type and parameters thereafter. My models were based in the regression analysis of GLMs, while Alexa created models using the machine learning methods of MaxEnt and Boosted Regression Trees (BRTs). Her best model based on AUC was a BRT model with 70m resolution (BRT 70 Out model) and pseudo-absences drawn randomly from the entire study area. I compared this model with my best model, the mean model derived from the ten 1/3rd weighted pseudo-absence GLM models (GLM Mean Weighted model). To compare final variables and their relative significance to their models I looked at percent contribution in Alexa’s model, but could only consider the mean coefficients for my model. This is due to the nature of making a mean model, as well as the limitations in deriving percent contribution of variables from GLM models using link functions or families other than identity and Gaussian (Package ‘relaimpo’ 2015). This however, still allows for a general comparison of variables of importance, especially given the absence of interactions in the GLM models (Table 4, in Results). To analyze the prediction results between the models I first compared the model maps in ArcGIS 10.3 and then used Map Comparison Kit 3.2 for further analysis. I developed a figure comparing the model predictions in side by side comparison of the probabilities of presence and a generalized comparison in which each model predicted where presence was more likely than not (> 50%) (Figure 7, in Results). I also developed two maps that then showed these differences more explicitly (Figure 8 and 9, in Results). Consensus

Consensus models have shown to be more robust than individual models and one of the more effective approaches has been the simple mean model (Marmion et al. 2009). Previously, I created a model from the mean of ten GLM models based on ten replicate data sets with weighted pseudo-absences. Another effective approach for consensus models is weighting the models by AUC, but the AUC scores in this case were derived from different testing data and there were two averaged AUC scores generated for the GLM Mean Weighted model. Mean consensus was selected because of these complications and because mean models have shown similar results in previous studies (Marmion et al. 2009).

11

Results:

Model Selection

Final models selected the same exact variables using BIC regardless of weighting. Variables selected were also remarkably consistent across data sets. The models contained between 9 and 12 variables, with the same 9 variables occurring in all models (Table 1). The variables included only surficial geology of the 3 categorical variables considered. Of the remaining variables 4 were vegetative indices, 3 were climate variables, 3 were distance variables, and 1 was topographic. Table 1. Variables selected for each data set’s model. Weighted and unweighted models selected the same variables for the same corresponding data set. X = present in model, NS = Not Selected for model (not present in formula with lowest BIC), R = Removed from consideration (VIF >5).

Data Data Data Data Data Data Data Data Data Data Variable Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 Set 8 Set 9 Set 10 Rock Type X X X X X X X X X X May Precipitation X X X X X X X X X X August Precipitation X X X X X X X X X X August Temperature X X X X X X X X X X February NDVI X X X X X X X X X X May TCTG X X X X X X X X X X Elevation X X X X X X X X X X Distance to Wetlands X X X X X X X X X X Distance to Roads X X X X X X X X X X August EVI NS X X X NS X X X X X February TCTW NS X X X X X X NS NS X Distance to Ocean NS NS X X NS NS NS NS NS NS Land Use/Land Cover NS NS NS NS NS NS NS NS NS NS Soil Type NS NS NS NS NS NS NS NS NS NS May Temperature NS NS NS NS NS NS NS NS NS NS February TCTB NS NS NS NS NS NS NS NS NS NS May TCTB NS NS NS NS NS NS NS NS NS NS August TCTB NS NS NS NS NS NS NS NS NS NS August TCTW NS NS NS NS NS NS NS NS NS NS February Precipitation R R R R R R R R R R February Temperature R R R R R R R R R R February EVI R R R R R R R R R R February TCTG R R R R R R R R R R May NDVI R R R R R R R R R R May EVI R R R R R R R R R R May TCTW R R R R R R R R R R August NDVI R R R R R R R R R R August TCTG R R R R R R R R R R Slope R R R R R R R R R R

12

Results of partial and full training data based accuracy measures showed that all models in the set employing 1/3rd weighting for pseudo-absences had a higher AUC and sensitivity than their counter part with equal weighting of pseudo-absences (Table 2 for model averages, Appendix H for individual model results). Specificity seemed to suffer in the weighted models, but was not valued as much as sensitivity and overall AUC. The kappa statistic, which is a measure of the observed accuracy with consideration to the expected accuracy (Cohen 1960), was slightly larger in the unweighted model for the training data, but was much larger in the weighted model for the partial training data. Because the average AUC scores of the weighted models were greater for each set of prediction testing data, the weighted models were selected for use in the final GLM model over the unweighted models. Table 2. Average of model accuracy against full and partial training data (Training presences, non-training pseudo-absences).

Data used for Omiss. Prop. Model Type accuracy Weights Threshold AUC Sens. Spec. Kappa Rate Correct measurements Average of models Training Data 1:1 0.5 0.80 0.31 0.69 0.91 0.85 0.60 unweighted Average of models Training Data 1:1/3 0.5 0.84 0.12 0.88 0.80 0.82 0.59 weighted Average of models Training presences, 1:1 0.5 0.83 0.31 0.69 0.97 0.83 0.66 unweighted non-training PAs Average of models Training presences, 1:1/3 0.5 0.90 0.12 0.88 0.92 0.90 0.80 weighted non-training PAs

The variable coefficients of the weighted model for each data set can be seen in Table 3. The categorical variable surficial geology, or Rocktype, was the most important variable in each of the data sets’ models. The strongest influence was from the rocktypes “delta”, “clastic/carbonate”, and “sandstone”, each with negative coefficients approximately three to four times greater than the absolute values of any other variable coefficient. Because these values are negative, rocktypes that are less negative like “clay or mud”, or rocktypes not included in the model such as “beach sand” or “alluvium” would be preferable for Telephus spurge presence based on these models. The majority of variables had a negative coefficient in the models, the exceptions being February NDVI, Elevation, and August Temperature.

13

Table 3. Final model coefficients for the variables of each data set. These models were selected using lowest BIC while considering the variables without interaction and while weighting pseudo-absences 1/3rd that of presences.

Avg. Data Data Data Data Data Data Data Data Data Data Average Absolute Set 1 Set 2 Set 3 Set 4 Set 5 Set 6 Set 7 Set 8 Set 9 Set 10 Variable Value Rocktype - delta -19.09 -19.58 -18.82 -18.71 -19.75 -19.36 -20.24 -20.76 -18.14 -19.36 -19.38 19.38 Rocktype - clastic/carb -16.00 -15.95 -15.97 -15.76 -15.14 -15.83 -16.25 -16.23 -15.71 -15.62 -15.85 15.85 Rocktype - sandstone -15.30 -16.26 -9.46 -11.75 -16.13 -11.58 -19.40 -16.25 -11.57 -16.24 -14.39 14.39 Rocktype - clay or mud -4.62 -4.59 -5.09 -4.90 -4.55 -4.67 -5.40 -5.70 -3.76 -5.07 -4.84 4.84 Distance to Road -2.68 -3.01 -2.94 -2.56 -2.85 -2.77 -2.84 -2.88 -2.84 -2.80 -2.82 2.82 August Precipitation -1.87 -1.88 -1.96 -2.17 -2.13 -2.00 -2.43 -2.44 -1.49 -2.30 -2.07 2.07 May Precipitation -1.21 -1.14 -1.19 -1.35 -1.35 -1.15 -1.36 -1.44 -1.11 -1.41 -1.27 1.27 February NDVI 0.62 1.02 0.87 1.02 0.78 0.95 1.00 0.56 0.76 0.94 0.85 0.85 Rocktype - beach sand -0.91 -0.81 -1.13 -0.84 -0.53 -0.69 -0.76 -1.05 -0.84 -0.77 -0.83 0.83 Elevation 0.63 0.59 0.65 0.61 0.62 0.54 0.60 0.82 0.59 0.65 0.63 0.63 Distance to Wetlands -0.52 -0.44 -0.56 -0.45 -0.62 -0.56 -0.61 -0.59 -0.51 -0.66 -0.55 0.55 May TCTG -0.59 -0.57 -0.38 -0.46 -0.57 -0.47 -0.40 -0.54 -0.47 -0.53 -0.50 0.50 August Temperature 0.48 0.47 0.45 0.48 0.52 0.50 0.39 0.44 0.51 0.51 0.48 0.48 August EVI Absent -0.39 -0.33 -0.30 Absent -0.35 -0.41 -0.36 -0.40 -0.27 -0.35 0.35 February TCTW Absent -0.36 -0.34 -0.41 -0.36 -0.31 -0.29 Absent Absent -0.37 -0.35 0.35 Distance to Ocean Absent Absent -0.28 -0.33 Absent Absent Absent Absent Absent Absent -0.31 0.31

Prediction

The final GLM model prediction, based on the results of the model selection process discussed above, is a mean weighted model averaging the predicted probability of Telephus spurge presence for each point in the study area from the final ten weighted model results. This model, now referenced as the GLM Mean Weighted model, shows the highest probabilities of Telephus spurge presence, approaching 100%, are around Tyndall Air Force Base, just south of the St. Andrew’s East Bay in Bay County (Figure 5). This area is the most notable area of high probabilities, even exceeding the areas of the known Telephus spurge presence data entered into the model. The significance of this area is not lost when viewing the min or max model results (Figure 6). The min and max models tend to show somewhat modest shifts in the probabilities of presence from the mean model and seem to emphasize or deemphasize areas already distinct from background, rather than producing novel areas of interest.

14

Areas of known Telephus spurge presence tend to have significantly higher probabilities of presence than their immediate surrounding areas or the average background probability which is near zero, but these probabilities are not frequently greater than 50% (Figure 5). The area near St. Joseph Bay, with the greatest populations of Telephus spurge, is shown to have a number of locations with probability of presence greater than 50%, but these are mostly small non- contiguous localized spots with probabilities of 30% up to 50% in between them. The area of known Telephus spurge presence around East Point has no probabilities of Telephus spurge presence greater than 50%. The area around the known Panama City Beach populations of Telephus spurge only has one spot greater than 50%. 50% is used as a reference point here as it suggests that presence is more likely than not. Other areas of probability that standout from background and are not associated with currently known Telephus spurge presence include the north side of St. Andrew’s East Bay, around the northern study area boundary in Bay County, the western coast of Bay County, and along the coast approaching Bald Point State Park in eastern Franklin County.

Figure 5. Predicted probability of presence of Telephus spurge based on the GLM Mean Weighted model. Shown in pink are the known locations of Telephus spurge presence inputted into the models.

15

Figure 6. A comparison of the predictions of the GLM Weighted minimum, mean, and maximum models. These maps were derived using the smallest, average, and largest values for each centroid of those ten weighted models, respectively. Comparison

Table 4 shows a rank comparison of variables for the GLM Mean Weighted and BRT 70 Out model, the best models from each researcher. Seven of the twelve variables in each model are the same with one variable, May TCTW, having been removed from consideration from the GLMs because of the variable’s VIF score. Of notable absence is the highest ranked variable (Rock Type and Soil Type) for each model from its counter model. This is likely due to the fact that they are categorical variables that may have a high degree of correlation and may simply be filling a similar role in each model. If so, then once a model algorithm preferred one of those top variables, adding the other variable may not have contributed additional accuracy to the model to overcome the penalty of an additional variable, as seen for example when using BIC for model

16 selection. Because they were categorical variables, they were not subject to the VIF standards to reduce multicollinearity as the numeric variables were. To test this possibility, I created dummy variables with binary results for each of the categories within Rock Type and Soil Type, and then ran a VIF analysis. Five of the dummy variables were found to have a VIF over 25 and another was above twelve, and another above six. For reference, the initial VIF cutoff to remove a variable was greater than five. So this suggests a good deal of multicollinearity between the top variable categories and shows it likely to be the case that these variables potentially play a similar role in their respective models. Table 4. A ranking of variable importance for my GLM Mean Weighted model and for Alexa Mainella’s BRT 70 Out model.

Rank GLM BRT 70 Out 1 Rock Type Soil Type 2 Distance to Road May Precipitation 3 August Precipitation Land Use/Land Cover 4 May Precipitation Distance to Road 5 February NDVI Distance to Ocean 6 Elevation February TCTB 7 Distance to Wetlands May TCTB 8 May TCTG Distance to Wetland 9 August Temperature August Temperature 10 August EVI May TCTW 11 February TCTW February TCTW 12 Distance to Ocean February NDVI

Comparing probability of presence across the study area, the BRT 70 Out model generated a far greater area with higher probabilities of presence than the GLM Mean Weighted model (Figure 7). These areas are mostly found around the known presences of Telephus spurge, but include a greater area of higher probability even outside these areas, though not significantly. Both model predictions show the majority of the study area near zero probability of Telephus spurge presence. The area of greatest probability in the GLM Mean Weighted model on Tyndall Air Force Base is generally consistent with that of the BRT 70 Out model. Figure 8 shows the areas in which the two models agree and disagree on Telephus spurge presence being more likely than not. Figure 9 shows the actual difference in probability of presence between the models for each point in the study area. The models tend to agree around the area of Tyndall Air Force Base and on the majority of area away from the coast, but disagree heavily in and surrounding areas of known presence of Telephus spurge. The differences in the predicted probability of presence in these areas are frequently greater than 40 or 60 percentage points, and less often 80 percentage points.

17

Figure 7. Probabilities of presence of Telephus spurge in the BRT 70 Out model (top) and GLM Mean Weighted model (bottom). Maps on the left show a continuous scale, while maps on the right show where presence is greater or lesser than 50%.

18

Figure 8. Map distinguishing areas of agreement and disagreement between the GLM Mean Weighted and BRT 70 Out model using categories of greater or lesser than 50% probability of presence of Telephus spurge.

Figure 9. Map shows the amount of percentage point disagreement between GLM Mean Weighted and BRT 70 Out model on the probability of Telephus spurge presence.

19

Consensus

The consensus model results show a pattern fitting the expectation of averaging the two models together. The area around Tyndall Air Force Base has the highest probability of presence of Telephus spurge, frequently greater than 90% (Figure 10). Areas surrounding known presences tend to have lesser probabilities associated with them, generally between 40 and 90%. When considering probabilities only in terms of more likely than not, the areas surrounding known presences are somewhat localized, particularly around the smaller populations of Telephus spurge seen in the Bay and Franklin County (Figure 11).

Figure 10. Predicted probability of presence of Telephus spurge based on the Consensus model. Shown in pink are the known locations of Telephus spurge presence initially used in creating the model.

20

Figure 11. Predicted probability of presence of Telephus spurge based on the Consensus model. Shown as greater than or lesser than 50% probability. In order to summarize some of the important decisions and findings made and described in the methodology and results sections, Figure 13 shows the decision points in Figure 1, but updated to reflect the results and final determinations ultimately made in the study.

21

Figure 12. A flowchart of the methodological approach used in this study. Boxes in white show steps in which a final determination was made about an important study feature. Boxes in green show the subsequent actions followed.

Discussion:

Conclusions

While constructing these models a new location for Telephus spurge was found at Tyndall Air Force Base (Figure 13). This datum was never used in any of the models, nor was it used to adjust model outcomes, which were completed before discovery of this new location. This discovery represents an incidental example of ground-truthing and minor validation of Alexa’s BRT 70 Out model, the GLM Mean Weighted model, and subsequently the Consensus model. Each of these models were effective at selecting the newly discovered Telephus spurge population found on Tyndall Air Force Base as an area of high probability for presence, all of them showing probabilities greater than 90%. The Consensus model is likely to be an effective

22 tool for predicting suitable habitat that should be used in efforts to find new populations, or in consideration for selecting transplantation sites. While probabilities of greater than 50% are not frequently found away from the coast, probabilities that standout from the background that are more inland should be investigated if seeking to protect the species from the future threat of sea level rise.

Areas of higher probability from the Consensus map, outside of currently known populations, include the northern side of St. Andrew’s East Bay, around the north and northwestern study area boundary in Bay County, north of Apalachicola’s East Bay, the western coast of Bay County, and along the coast approaching Bald Point State Park in eastern Franklin County. When investigating areas highlighted by the models, localized focus should consider the presence of sandy soils, long leaf pines, wire grass, and mowing or fire disturbance. One interesting area of generally higher probability follows along Cedar Creek that extends from Deer Creek Lake just above St. Andrew’s North Bay. These probabilities are likely tied to the “alluvium” rock type there, which stands out among the “clay or mud” surroundings (Appendix D). Other variables are surely at play however, as other areas of alluvium are not consistently tied to higher probabilities of presence. But as mentioned previously in the Results-Model Selection portion of the study, Rocktype is the most important variable and “beach sand” and “alluvium” stand out as the only categories of rock type not presenting a negative association with Telephus spurge in the model. Limitations and Areas for Improvement

The data used for this study was limited in its nature and gathered without consistent protocol. The data came from several agencies gathered at distant time periods over several decades. The data was recorded in several different formats that required reconciliation. The data consisted only of presences without regard for failure to find Telephus spurge. It did not include search hours and only some of the data provided densities. Searches were not randomized or approached in any standardized manner. Some of the populations used for data in the models have since been extirpated. These conditions naturally limited the study potential and accuracy. Gathering data completely and systematically with standardized methods using well thought out protocols has the ability to improve modelling outcomes and versatility. Developing and implementing such protocols throughout relevant agencies should be a priority and considered with potential study goals in mind whenever possible.

Surveying designs for SDMs should vary based on the target species, consideration of environmental variables, and resource availability (Tessarolo et al. 2014). No survey design will be perfect and there is often tradeoffs between sampling bias, survey effort, and discovery of species presence (Edwards et al. 2006). The randomization required to reduce sampling bias often increases the time and effort needed to survey, and can reduce discovery of presences as expert knowledge won’t be used to select survey sites (Edwards et al. 2006). As discovery of presence sites should be important for SDMs of rare and ephemeral species like Telephus spurge I would not recommend only randomized surveying that prohibits targeted searches of high probability areas. However, it is still important to mitigate the effects of sampling bias, and so I would recommend additional surveying that is either randomized, or stratified by important

23 environmental variables, particularly within a grid system that spatially spreads surveying efforts throughout the study area. Given the nature of species like Telephus spurge, repeated efforts to the same site are warranted, but may still not justify a true absence designation when not found. When Telephus spurge is discovered, efforts should be made to find the extent of the population and to make density estimations when feasible. While presence data may be the only requirement for some SDMs, having information on abundance can only add to model possibilities. One intriguing thought is to create disturbance at survey sites through mowing or controlled burning to induce growth of potentially dormant Telephus spurge.

The GLM model showed to be rather inflexible in this study. Convergence, singularity, and computational difficulties were common. Some of these difficulties stemmed from using extensive categorical variables that, along with statistical preferences, limited the pseudo- absence data sets from being too few or too many. These difficulties prevented interactions from being considered in the model, which has the potential to greatly limit model accuracy. The limited number of data points was also shown to be problematic and prevented full cross validation from being used in AUC metrics of model accuracy. Cross validated AUC only considers unseen data and so is a much more telling measure of a model’s predictive ability than measures using data in which the model was trained on. Extensive categorical variables should be used cautiously. Re-categorizing the variable types to reduce and generalize the categories may address some of these issues, though it is not always possible as was the case in this study. As well, creating dummy variables with binary responses for each category within the original variable (as done in the post model VIF analysis) may further alleviate some problems. If not addressable, one may have to consider dropping the variable from consideration in the model. While GLMs are a tried and true model for predicting species distribution, they are not always the easiest or most accurate models available. While some studies suggest they are less prone to overfitting than machine learning models (Wenger and Olden 2012), one should take into account the general and specific limitations regarding their data, their study goals, and their resource availability. Many machine learning models are flexible and designed with simplistic and intuitive coding or interfaces. As long as one does not simply forgo proper model parameterization for default settings, machine learning models may be a preferred option in many situations. And when possible, the use of replicates within a model type and consensus models across model types are preferable to individual models. They each tend to reduce natural variability and generate more robust results. Both mean and AUC weighted methods for consensus models have shown in previous studies to produce better results than several alternative methods (Marmion et al. 2009). Measuring model accuracy may be improved through the use of K-fold cross validation when possible. Not all potentially useful variables were incorporated into this study, especially in the areas of disturbance and specific biotic associations. Fire disturbance was excluded because of its inconsistent and temporal nature. Available data was limited and inconsistent and the quick growth of vegetation in Florida often hides the effects of fire disturbance from satellite imagery after a short time. Biotic associations to Telephus spurge, such as long leaf pine and wire grass

24 presence, were also absent from the study due to data limitations. We were unable to find data of sufficient coverage or resolution to include in our variables. Finally, this study limited the study area to just the three counties that Telephus spurge has ever been found. While adjacent counties may have some confounding factors that may make them less likely to harbor current or future populations of Telephus spurge, they need not be wholly removed from consideration. Future study may want to consider including adjacent counties and gathering applicable disturbance and biotic data of sufficient scale and scope to further inform and refine SDM models for current populations. The long term focus for Telephus spurge conservation should likely be on climate change and sea level rise. Incorporating climatic change into SDMs and determining potential habitat sufficiently inland and of higher elevation should be a priority to avoid the threat of extinction.

Figure 13. Predicted probability of presence of Telephus spurge based on the Consensus model. Pink star designates location of newly discovered population on Tyndall Air Force Base.

25

Literature Cited:

Allen, Michael Patrick. 1997. The problem of multicollinearity. Understanding Regression Analysis. Plenum Press, 176-180. Barbet-Massin, Morgane, Frédéric Jiguet, Cécile Hélène Albert, and Wilfried Thuiller. 2012. Selecting pseudo-absences for species distribution models: how, where and how many? Methods in Ecology and Evolution 3: 327-338. Boria, Robert A., Link E. Olson, Steven M. Goodman, and Robert P. Anderson. 2014. Spatial filtering to reduce sampling bias can improve the performance of ecological niche models. Ecological Modelling 275: 73-77. Biodiversity and Climate Change Virtual Laboratory. ‘Absence data’ 2016. https://support.bccvl.org.au/support/solutions/articles/6000127043-absence-data. Accessed 11/1/2016 Bridges, Edwin L., and Steve L. Orzell. 2002. Euphorbia () section Tithymalus subsection Inundatae in the southeastern United States. Lundellia. 59-78. Campbell, James B. and Randolph H. Wynne. Introduction to Remote Sensing, 5th edition. New York: The Guilford Press, 2011. Cavanaugh, Joseph E., and Andrew A. Neath. 1999. Generalizing the derivation of the Schwarz information criterion. Communications in Statistics-Theory and Methods 28, no. 1, 49-66. Cavanaugh, Joseph E.. 2009. 171: 290 model selection, lecture VI: the Bayesian information criterion. Department of Biostatistics, Department of Statistics and Actuarial Science, University of Iowa. Cohen, Jacob. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement. 20(1), 37-46. Ecological Resource Consultants. 2006. Management report for year 2006 for the North Glades Telephus Spurge mitigation. 26 pp. Edwards, Thomas C. Jr., D. Richard Cutler, Niklaus E., Linda Geiser, and Gretchen G. Moisen. 2006. Effects of sample survey design on the accuracy of classification tree models in species distribution models. Ecological Modelling. 199. 132-141. Elith, Jane, Catherine H. Graham, Robert P. Anderson, Miroslav Dudίk, Simon Ferrier, Antoine, Guisan, Robert J. Hijmans, Falk Huettmann, John R. Leathwick, Anthony Lehmann, Jin Li, Lucia G. Lohmann, Bette A. Loiselle, Glenn Manion, Craig Moritz, Miguel Nakamura, Yoshinori Nakazawa, Jacob McC. Overton, A. Townsend Peterson, Steven J. Phillips, Karen Richardson, Ricardo Scachetti-Pereira, Robert E. Schapire, Jorge Sobero´n, Stephen Williams, Mary S. Wisz and Niklaus E. Zimmermann. 2006. Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29: 129-151.

26

Elith, Jane and John R. Leathwick. 2009. Species Distribution Models: Ecological Explanation and Predication Across Space and Time. The Annual Review of Ecology, Evolution, and Systematics 40: 677-697. Florida Geologic Survey. 2001. Geologic Map of the State of Florida. https://mrdata.usgs.gov/geology/state/state.php?state=FL Franklin, Janet. 2013. Species distribution models in conservation biogeography: developments and challenges. Diversity and Distributions. 19: 1217-1223. FWC-FWRI. 2014. Cooperative Land Cover Version 3.0. http://www.fnai.org/LandCover.cfm Guisan, Antoine and Niklaus E. Zimmermann. 2000. Predictive habitat distribution models in ecology, Ecological Modelling, Volume 135. Issues 2–3. 5 December 2000. 147-186. Guisan, Antoine, Thomas Edwards Jr, and Trevor Hastie. 2002. Generalized linear and generalized additive models in studies of species distributions: setting the scene. Ecological Modelling. Volume 157, Issues 2–3, 30 November 2002, pages 89-100. Institute for Digital Research and Education. ‘FAQ’ 2016. http://www.ats.ucla.edu/stat/mult_pkg/faq/general/complete_separation_logit_models.htm Accessed 11/28/2016 Kass, Robert E. and Adrian E. Raftery. 1995. Bayes factors. Journal of the american statistical association, 90(430), 773-795. Kruckeberg, Arthur R. 1969. Soil diversity and the distribution of plants, with examples from western North America. Madroño, 20(3), 129-154. Mainella, Alexa M. 2016. Comparison of maxent and boosted regression tree model performance in predicting the spatial distribution of threatened plant, telephus spurge (euphorbia telephioides). Master’s thesis. Miami University. https://etd.ohiolink.edu/pg_10?210588120183563::NO:10:P10_ETD_SUBID:114741 Accessed 8/16/16 Marmion, Mathieu, Mila Parviainen, Miska Luoto, Risto K. Heikkinen, and Wilfried Thuiller. 2009. Evaluation of consensus methods in predictive species distribution modeling. Diversity and Distributions. 15: 59-69. Marschner, Ian C. 2011. glm2: fitting generalized linear models with convergence problems. The R journal, 3(2), 12-15. Milligan, Glenn W. and Martha C. Cooper. 1988. A Study of Standardization of Variables in Cluster Analysis. Journal of Classification, 5: 181-204. Package ‘relaimpo’ 2015. CRAN Repository. https://cran.r- project.org/web/packages/relaimpo/relaimpo.pdf Accessed 8/16/16

27

Peet, Robert and Dorothy J. Allard. 1993. vegetation of the southern Atlantic and eastern Gulf Coast regions: a preliminary classification. In: Hermann, S.M. (ed.) The longleaf pine ecosystem: ecology, restoration and management. 45-81. Peterson, Andrew T. and Jorge Soberon. 2012. Integrating fundamental concepts of ecology, biogeography, and sampling into effective ecological niche modeling and species distribution modeling. Plant Biosystems. 146: 789-796. Platt, William J.. 2 Southeastern Pine Savannas. Savannas, barrens, and rock outcrop plant communities of North America. Cambridge University Press, 1999. Phillips, Steven J., Robert P. Anderson, and Robert E. Schapire. 2006. Maximum entropy modeling of species geographic distributions. Ecological modelling 190.3, 231-259. Phillips, Steven J. and Miroslav Dudik. 2008. Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography 31: 161-175. Searcy, K., Wilson, B., and Fownes, J. 2003. Influence of Bedrock and Aspect on Soils and Plant Distribution in the Holyoke Range, Massachusetts. The Journal of the Torrey Botanical Society, 130(3), 158-169. Strahler, Allen H. 1978. Response of Woody Species to Site Factors of Slope Angle, Rock Type, and Topographic Position in Maryland as Evaluated by Binary Discriminant Analysis. Journal of Biogeography, 5(4), 403-423. Tessarolo, Geiziane, Thiago F. Rangel, Miguel B. Araujo, and Joaquın Hortal. 2014. Diversity and Distributions. 20. 1258-1269. Trapnell, Dorset W., J.L. Hamrick, and Vivian Negrón-Ortiz. 2012. Genetic diversity within a threatened, endemic North American species, Euphorbia telephioides (Euphorbiaceae). Conservation Genetics. 7: 743-751. U.S. Census Bureau. 2007. County Boundaries. http://www.fgdl.org/metadataexplorer/explorer.jsp U.S. Census Bureau, TIGER/line Shapefiles. 2014. https://www.census.gov/geo/maps- data/data/tiger-line.html U.S. Department of Agriculture, Natural Resources Conservation Service. 2013. Soil Survey Geographic (SSURGO). http://websoilsurvey.nrcs.usda.gov/app/WebSoilSurvey.aspx U.S. Fish & Wildlife Service. 1992. Threatened status for three Florida plants. Federal Register. 57, 90: 19813-19819. U.S. Fish and Wildlife Service. 2014. 5-year Review: Summary and Evaluation (Telephus spurge). Unpublished report. U.S. Geologic Service. EarthExplorer. 2014. Landsat 8 images. http://earthexplorer.usgs.gov/

28

U.S. Geologic Service. National Elevation Dataset (NED). https://viewer.nationalmap.gov/basic/?basemap=b1&category=ned,nedsrc&title=3DEP%20View VanDerWal, Jeremy, Shoo, Luke P., Graham, Catherine, Williams, Stephen E.. 2009. Selecting pseudo-absence data for presence-only distribution modeling: How far should you stray from what you know? Ecological Modeling. 220: 589-594. Wenger, Seth J. and Julian D. Olden. 2012. Assessing transferability of ecological models: an underappreciated aspect of statistical validation. Methods in Ecology and Evolution. 3: 260–267. Williams, John N., Changwan Seo, James Thorne, Julie K. Nelson, Susan Erwin, Joshua M. O’Brien, and Mark W. Schwartz. 2009. Using species distribution models to predict new occurrences for rare plants. Diversity and Distributions 15: 565-576. Wit, Ernst, Edwin van den Heuvel, and Jan‐Willem Romeijn. 2012. All models are wrong...: an introduction to model uncertainty. Statistica Neerlandica 66.3, 217-236. WorldClim. 2014. WorldClim 1.4: Current conditions. http://www.worldclim.org/current

29

APPENDIX A – TABLE OF ALL VARIABLES CONSIDERED IN STUDY

Table 5. Table of all study variables considered

Variable Original Dataset Present in Variable Name Abbreviation Source Dataset Name Type Resolution Cat./Contin. Final Model Cooperative Florida Natural Land Cover Land Use/Land Areas Inventory Map Version Feature Cover LULC (FNAI 2014) 3.0 Class 10m Categorical No Natural Resources Conservation Soil Survey Service (NRCS Geographic Feature Soil Type Soil Type 2013) (SSURGO) Class 10m Categorical No Florida Geologic Map Geological of the State of Rock Type Rock Type Survey (2001) Florida Shapefile 100m Categorical Yes WorldClim 1.4: Average February February Current Precipitation Precipitation WorldClim conditions Raster 930m Continuous No WorldClim 1.4: Average May May Current Precipitation Precipitation WorldClim conditions Raster 930m Continuous Yes WorldClim 1.4: Average August August Current Precipitation Precipitation WorldClim conditions Raster 930m Continuous Yes WorldClim 1.4: Average February February Current Temperature Temperature WorldClim conditions Raster 930m Continuous No WorldClim 1.4: Average May May Current Temperature Temperature WorldClim conditions Raster 930m Continuous No WorldClim 1.4: Average August August Current Temperature Temperature WorldClim conditions Raster 930m Continuous Yes Normalized Difference U.S. Geological Vegetative Index February Service (USGS), Landsat 8 for February NDVI EarthExplorer images, 2014 Raster 30m Continuous Yes Normalized Difference U.S. Geological Vegetative Index Service (USGS), Landsat 8 for May May NDVI EarthExplorer images, 2014 Raster 30m Continuous No

Variable Name Variable Source Original Dataset Resolution Cat./Contin. Present in

30

Abbreviation Dataset Name Type Final Model Enhanced U.S. Geological Vegetative Index Service (USGS), Landsat 8 for February February EVI EarthExplorer images, 2014 Raster 30m Continuous No Enhanced U.S. Geological Vegetative Index Service (USGS), Landsat 8 for May May EVI EarthExplorer images, 2014 Raster 30m Continuous No Enhanced U.S. Geological Vegetative Index Service (USGS), Landsat 8 for August August EVI EarthExplorer images, 2014 Raster 30m Continuous Yes Tasseled Cap Transformation U.S. Geological Brightness for February Service (USGS), Landsat 8 February TCTB EarthExplorer images, 2014 Raster 30m Continuous No Tasseled Cap Transformation U.S. Geological Brightness for Service (USGS), Landsat 8 May May TCTB EarthExplorer images, 2014 Raster 30m Continuous No Tasseled Cap Transformation U.S. Geological Brightness for Service (USGS), Landsat 8 August August TCTB EarthExplorer images, 2014 Raster 30m Continuous No Tasseled Cap Transformation U.S. Geological Greenness for February Service (USGS), Landsat 8 February TCTG EarthExplorer images, 2014 Raster 30m Continuous No Tasseled Cap Transformation U.S. Geological Greenness for Service (USGS), Landsat 8 May May TCTG EarthExplorer images, 2014 Raster 30m Continuous Yes Tasseled Cap Transformation U.S. Geological Greenness for Service (USGS), Landsat 8 August August TCTG EarthExplorer images, 2014 Raster 30m Continuous No Tasseled Cap Transformation U.S. Geological Wetness for February Service (USGS), Landsat 8 February TCTW EarthExplorer images, 2014 Raster 30m Continuous Yes Tasseled Cap U.S. Geological Transformation Service (USGS), Landsat 8 Wetness for May May TCTW EarthExplorer images, 2014 Raster 30m Continuous No Tasseled Cap Transformation U.S. Geological Wetness for Service (USGS), Landsat 8 August August TCTW EarthExplorer images, 2014 Raster 30m Continuous No

Variable Name Variable Source Original Dataset Resolution Cat./Contin. Present in

31

Abbreviation Dataset Name Type Final Model National U.S. Geological Elevation Slope Slope Service (USGS) Dataset (NED) DEM 10m Continuous No Florida Natural Areas Inventory Cooperative (FNAI 2014) Land Cover Distance to Distance to (*calculated Map Version Feature Wetlands Wetlands from) 3.0 Class 10m Continuous Yes U.S. Census Bureau TIGER/Line - All Distance to (*calculated Roads - Bay, Distance to Roads Roads from) Gulf, Franklin Shapefile 24m Continuous Yes U.S. Census Bureau (*calculated from polyline Distance to Distance to trace of Gulf County Ocean Ocean Coast) Boundaries Shapefile 100m Continuous Yes

32

APPENDIX B – LIST OF ALL LAND USE/LAND COVER TYPES IN STUDY AREA

Table 6. Land use/land cover type, acreage, and % study area. Map files can be downloaded at http://myfwc.com/research/gis/applications/articles/Cooperative-Land-Cover

Land Use/Land Cover (LULC) Type Area (acres) % Study Area Coniferous Plantations 376,796 34.87 Hydric Pine Flatwoods 115,895 10.73 Other Wetland Forested Mixed 76,121 7.05 Floodplain Swamp 76,102 7.04 Mixed Scrub-Shrub Wetland 58,672 5.43 Tree Plantations 57,995 5.37 Transportation 45,500 4.21 Wet Prairie 34,085 3.15 Rural Open 29,257 2.71 Salt Marsh 25,864 2.39 Residential, Med.Density -2-5 Dwelling Units/AC 23,341 2.16 Mixed Wetland Hardwoods 13,571 1.26 Mesic Flatwoods 10,245 0.95 Coastal Scrub 10,044 0.93 Upland Coniferous 9,713 0.90 Cypress 7,924 0.73 Residential, High Density > 5 Dwelling Units/AC 7,451 0.69 Mixed Hardwood-Coniferous 6,889 0.64 Shrub and Brushland 5,527 0.51 Marshes 5,393 0.50 Commercial and Services 5,380 0.50 Wet Flatwoods 5,352 0.50 Utilities 4,335 0.40 Riverine 4,155 0.38 Lacustrine 3,551 0.33 Institutional 3,369 0.31 Sandhill 3,309 0.31 Scrubby Flatwoods 3,220 0.30 Pine Flatwoods and Dry Prairie 3,098 0.29 Extractive 2,902 0.27 Tidal Flat 2,450 0.23 Sand Beach (Dry) 2,383 0.22 Beach Dune 2,271 0.21 Baygall 1,996 0.18 Artificial Impoundment/Reservoir 1,951 0.18 Titi Swamp 1,754 0.16

33

Land Use/Land Cover (LULC) Type Area (acres) % Study Area Wet Coniferous Plantation 1,646 0.15 Unimproved/Woodland Pasture 1,554 0.14 Marine 1,426 0.13 Golf Courses 1,424 0.13 Floodplain Marsh 1,409 0.13 Estuarine 1,393 0.13 Alluvial Stream 1,370 0.13 Improved Pasture 1,296 0.12 Bare Soil/Clear Cut 1,181 0.11 Sod Farms 1,149 0.11 Basin Swamp 1,061 0.10 Urban Open Forested 1,032 0.10 Alluvial Forest 1,029 0.10 Field Crops 972 0.09 Coastal Interdunal Swale 928 0.09 Rural Open Forested 862 0.08 Coastal Grassland 843 0.08 Industrial 838 0.08 Non-vegetated Wetland 831 0.08 Urban Open Land 702 0.06 Upland Hardwood Forest 661 0.06 Community rec. facilities 655 0.06 Basin Marsh 576 0.05 Shrub Bog 529 0.05 Canal 505 0.05 Floating/Emergent Aquatic Vegetation 454 0.04 Aquacultural Ponds 454 0.04 High Intensity Urban 429 0.04 Sand Pine Scrub 350 0.03 Maritime Hammock 344 0.03 Residential, Low Density 303 0.03 Rural Structures 299 0.03 Flatwoods/Prairie/Marsh Lake 269 0.02 Quarry Pond 265 0.02 Depression Marsh 241 0.02 Rural Open Pine 228 0.02 Sand n Gravel Pits 227 0.02 Cemeteries 200 0.02 Palmetto Prairie 200 0.02 Specialty Farms 195 0.02 Hydric Hammock 188 0.02

34

Land Use/Land Cover (LULC) Type Area (acres) % Study Area Tidally-Influenced Stream 168 0.02 Cultural - Terrestrial 165 0.02 Hardwood Plantations 114 0.01 Communication 112 0.01 Strip Mines 97 0.01 Sewage Treatment Pond 97 0.01 Urban Open Pine 93 0.01 Orchards/Groves 88 0.01 Mesic Hammock 88 0.01 Other Hardwood Wetlands 88 0.01 Industrial Cooling Pond 87 0.01 Parks and Zoos 86 0.01 Scrub 77 0.01 Bay Swamp 73 0.01 Coastal Uplands 72 0.01 Wiregrass Savanna 64 0.01 Live 51 0.00 Cultural - Lacustrine 46 0.00 Slough 37 0.00 Trees 37 0.00 Blackwater Stream 27 0.00 Stormwater Treatment Areas 26 0.00 Ornamentals 25 0.00 Fallow Cropland 25 0.00 Dome Swamp 23 0.00 Coastal Dune Lake 21 0.00 Unconsolidated Substrate 20 0.00 Tree Nurseries 19 0.00 Natural Rivers and Streams 17 0.00 Successional Hardwood Forest 16 0.00 Ballfields 13 0.00 Other Open Lands - Rural 8 0.00 Dry Flatwoods 7 0.00 Other Coniferous Wetlands 6 0.00 Oak - Cabbage Palm Forest 6 0.00 Roads 6 0.00 Coastal Hydric Hammock 6 0.00 Vineyard and Nurseries 5 0.00 Urban 5 0.00 Isolated Freshwater Swamp 4 0.00 Mowed Grass 4 0.00

35

Land Use/Land Cover (LULC) Type Area (acres) % Study Area Cabbage Palm 3 0.00 Rural 3 0.00 Oyster Bar 2 0.00 Artificial/Farm Pond 1 0.00 Cabbage Palm Hammock 1 0.00 Spoil Area 1 0.00 Natural Lakes and Ponds - 0.00 Bottomland Forest - 0.00

36

APPENDIX C – LIST OF ALL SOIL TYPES IN STUDY AREA

Table 7. Soil type, Acreage, and % study area. Map files can be downloaded at http://websoilsurvey.nrcs.usda.gov/app/WebSoilSurvey.aspx

Soil Type Area (acres) % Study Area LEON SAND 62,261 5.77 SCRANTON FINE SAND 61,477 5.69 PLUMMER FINE SAND 54,625 5.06 POTTSBURG SAND 43,897 4.07 RUTLEGE SAND 41,637 3.86 RUTLEGE FINE SAND 38,749 3.59 PELHAM LOAMY FINE SAND 36,722 3.40 BRICKYARD, CHOWAN, AND KENNER SOILS, FREQUENTLY 32,181 2.98 FLOODED SCRANTON SAND, SLOUGH 30,208 2.80 PICKNEY-PAMLICO COMPLEX, DEPRESSIONAL 29,499 2.73 HURRICANE SAND 27,657 2.56 ALBANY SAND, 0 TO 2 PERCENT SLOPES 25,897 2.40 PAMLICO-DOROVAN COMPLEX 24,689 2.29 RUTLEGE-PAMLICO COMPLEX 20,285 1.88 PICKNEY AND RUTLEGE SOILS, DEPRESSIONAL 19,817 1.84 CHOWAN, BRICKYARD, AND KENNER SOILS, FREQUENTLY 19,801 1.83 FLOODED RAINS FINE SANDY LOAM 19,757 1.83 LEON FINE SAND 17,448 1.62 SURRENCY MUCKY FINE SAND, DEPRESSIONAL 17,440 1.62 LEEFIELD SAND 15,776 1.46 PLUMMER SAND 15,673 1.45 MAUREPAS MUCK, FREQUENTLY FLOODED 15,152 1.40 FOXWORTH SAND, 0 TO 5 PERCENT SLOPES 14,937 1.38 LAKELAND SAND, 0 TO 5 PERCENT SLOPES 13,084 1.21 LEEFIELD LOAMY FINE SAND 13,025 1.21 BOHICKET AND TISONIA SOILS, TIDAL 12,579 1.17 SURRENCY FINE SAND 12,449 1.15 PANTEGO AND BAYBORO SOILS, DEPRESSIONAL 12,199 1.13 CROATAN-SURRENCY COMPLEX, FREQUENTLY FLOODED 12,071 1.12 CHIPLEY SAND, 0 TO 5 PERCENT SLOPES 11,617 1.08 WATER 11,590 1.07 OSIER FINE SAND 10,766 1.00 ALBANY SAND 10,167 0.94 RESOTA FINE SAND, 0 TO 5 PERCENT SLOPES 9,848 0.91 PAMLICO-PICKNEY COMPLEX, FREQUENTLY FLOODED 9,811 0.91 37

Soil Type Area (acres) % Study Area MEADOWBROOK SAND 9,686 0.90 BRICKYARD SILTY CLAY, FREQUENTLY FLOODED 9,224 0.85 LYNN HAVEN SAND 8,679 0.80 ALAPAHA LOAMY FINE SAND 8,552 0.79 MEADOWBROOK SAND, SLOUGH 8,159 0.76 MEGGETT FINE SANDY LOAM, OCCASIONALLY FLOODED 8,035 0.74 MEADOWBROOK FINE SAND, OCCASIONALLY FLOODED 8,026 0.74 PELHAM SAND 8,024 0.74 MANDARIN FINE SAND 7,855 0.73 PICKNEY FINE SAND 6,986 0.65 BAYVI LOAMY SAND 6,566 0.61 BLANTON FINE SAND, 0 TO 5 PERCENT SLOPES 6,207 0.57 DUCKSTON-RUTLEGE-COROLLA COMPLEX 5,731 0.53 ARENTS, 0 TO 5 PERCENT SLOPES 5,676 0.53 ALAPAHA LOAMY SAND 5,119 0.47 KUREB SAND, 0 TO 5 PERCENT SLOPES 4,956 0.46 MANDARIN SAND 4,764 0.44 RIDGEWOOD SAND, 0 TO 5 PERCENT SLOPES 4,585 0.42 ALLANTON SAND 4,405 0.41 ORTEGA FINE SAND, 0 TO 5 PERCENT SLOPES 4,333 0.40 PELHAM FINE SAND 4,245 0.39 HARBESON MUCKY LOAMY SAND, DEPRESSIONAL 4,232 0.39 KERSHAW SAND, 2 TO 5 PERCENT SLOPES 4,188 0.39 LYNN HAVEN FINE SAND 3,953 0.37 SAPELO SAND 3,945 0.37 COROLLA SAND, 0 TO 5 PERCENT SLOPES 3,678 0.34 PANTEGO SANDY LOAM 3,664 0.34 WAHEE-MANTACHIE-OCKLOCKNEE COMPLEX, 3,645 0.34 COMMONLY FLOODED DIREGO AND BAYVI SOILS, TIDAL 3,628 0.34

DOROVAN-CROATAN COMPLEX, DEPRESSIONAL 3,444 0.32 TOOLES-MEADOWBROOK COMPLEX, DEPRESSIONAL 3,318 0.31 DOROVAN-PAMLICO COMPLEX, DEPRESSIONAL 3,228 0.30 STILSON LOAMY FINE SAND, 0 TO 5 PERCENT SLOPES 3,179 0.29 STILSON SAND, 0 TO 5 PERCENT SLOPES 3,077 0.29 BLANTON SAND, 0 TO 5 PERCENT SLOPES 2,996 0.28 ALBANY FINE SAND 2,865 0.27 COROLLA-DUCKSTON COMPLEX, GENTLY UNDULATING, 2,799 0.26 FLOODED BLADEN FINE SANDY LOAM 2,599 0.24

38

Soil Type Area (acres) % Study Area DUCKSTON SAND, OCCASIONALLY FLOODED 2,553 0.24 SAPELO FINE SAND 2,470 0.23 AQUENTS, GENTLY UNDULATING 2,371 0.22 FRIPP-COROLLA COMPLEX, 2 TO 30 PERCENT SLOPES 2,342 0.22 CENTENARY SAND, 0 TO 5 PERCENT SLOPES 2,315 0.21 BEACHES 2,227 0.21 RAINS SAND 1,957 0.18 NEWHAN-COROLLA COMPLEX, ROLLING 1,918 0.18 MEADOWBROOK, MEGGETT, AND TOOLES SOILS, 1,819 0.17 FREQUENTLY FLOODED BONSAI MUCKY FINE SAND, FREQUENTLY FLOODED 1,800 0.17

DIREGO MUCK 1,728 0.16

RUTLEGE LOAMY FINE SAND, DEPRESSIONAL 1,653 0.15 PANSEY LOAMY SAND 1,623 0.15 KUREB-COROLLA COMPLEX, ROLLING 1,563 0.14 URBAN LAND 1,533 0.14 BAYVI AND DIREGO SOILS, FREQUENTLY FLOODED 1,508 0.14 TOOLES SAND 1,483 0.14 EBRO-DOROVAN COMPLEX 1,418 0.13 FOXWORTH SAND, 5 TO 8 PERCENT SLOPES 1,335 0.12 POTTSBURG FINE SAND 1,316 0.12 DUCKSTON-BOHICKET-COROLLA COMPLEX 1,265 0.12 CHAIRES SAND 1,240 0.11 PITS 1,184 0.11 WATERS OF THE GULF OF MEXICO 1,155 0.11 CLARENDON LOAMY FINE SAND, 2 TO 5 PERCENT SLOPES 1,078 0.10 WAHEE FINE SANDY LOAM 1,038 0.10 FUQUAY LOAMY FINE SAND 1,004 0.09 COROLLA FINE SAND, 1 TO 5 PERCENT SLOPES 874 0.08 WEHADKEE-MEGGETT COMPLEX, FREQUENTLY FLOODED 829 0.08 OCILLA LOAMY FINE SAND, OVERWASH, OCCASIONALLY 809 0.07 FLOODED TROUP SAND, 0 TO 5 PERCENT SLOPES 794 0.07

STILSON FINE SAND 724 0.07

KERSHAW SAND, 5 TO 12 PERCENT SLOPES 697 0.06 RIDGEWOOD FINE SAND 689 0.06 ALBANY SAND, 2 TO 5 PERCENT SLOPES 669 0.06 DUCKSTON-DUCKSTON DEPRESSIONAL COMPLEX, 655 0.06 FREQUENTLY FLOODED LAKELAND SAND, 8 TO 12 PERCENT SLOPES 628 0.06

39

Soil Type Area (acres) % Study Area AQUENTS, NEARLY LEVEL 607 0.06

DOTHAN-FUQUAY COMPLEX, 5 TO 8 PERCENT SLOPES 529 0.05

BONIFAY SAND, 0 TO 5 PERCENT SLOPES 516 0.05 QUARTZIPSAMMENTS, UNDULATING 489 0.05 LYNCHBURG LOAMY FINE SAND 346 0.03 KUREB FINE SAND, 3 TO 8 PERCENT SLOPES 340 0.03 DOTHAN LOAMY SAND, 2 TO 5 PERCENT SLOPES 296 0.03 LAKELAND SAND, 5 TO 8 PERCENT SLOPES 266 0.02 BLANTON FINE SAND, 5 TO 8 PERCENT SLOPES 224 0.02 CHIPLEY SAND, 5 TO 8 PERCENT SLOPES 179 0.02 TROUP SAND, 5 TO 8 PERCENT SLOPES 139 0.01 UDORTHENTS, NEARLY LEVEL 115 0.01 STILSON SAND, 5 TO 8 PERCENT SLOPES 112 0.01 LUCY LOAMY FINE SAND, 0 TO 5 PERCENT SLOPES 109 0.01 BONIFAY SAND, 5 TO 8 PERCENT SLOPES 24 0.00 KENNANSVILLE-EULONIA COMPLEX, 0 TO 5 PERCENT 23 0.00 SLOPES TROUP SAND, 8 TO 12 PERCENT SLOPES 14 0.00

40

APPENDIX D – MAP OF ROCK TYPES IN STUDY AREA

Figure 14. Map of rock types in study area.

41

APPENDIX E – MAPS OF PRECIP. AND TEMP. IN STUDY AREA

Figure 15. Average monthly precipitation in mm for February (a), May (b), and August (c). 42

Figure 16. Average monthly temperature in °Celsius for February (a), May (b), and August (c).

43

APPENDIX F – MAPS OF SPECTRAL VEGETATIVE INDICES IN STUDY AREA

Figure 17. Maps showing NDVI for February (a), May (b), and August (c) of 2014.

44

Figure 18. Maps showing EVI for February (a), May (b), and August (c) of 2014.

45

Figure 19. Maps showing TCTB for February (a), May (b), and August (c) of 2014.

46

Figure 20. Maps showing TCTG for February (a), May (b), and August (c) of 2014.

47

Figure 21. Maps showing TCTW for February (a), May (b), and August (c) of 2014.

48

APPENDIX G - MAPS OF ELEVATION AND SLOPE

Figure 22. Map of elevation.

Figure 23. Map of percent slope.

49

APPENDIX H – TABLE OF AUC RESULTS FOR REPLICATE GLM MODELS

Table 8. AUC results for unweighted and weighted GLM models using full training data.

Against Training Data Model Weights threshold AUC Ommis.rate Sensitivity Specificity prop.correct kappa model1 1:1 0.5 0.807 0.281 0.719 0.895 0.851 0.606 model2 1:1 0.5 0.783 0.343 0.657 0.909 0.846 0.580 model3 1:1 0.5 0.797 0.311 0.689 0.905 0.851 0.599 model4 1:1 0.5 0.811 0.294 0.706 0.916 0.864 0.631 model5 1:1 0.5 0.800 0.298 0.702 0.898 0.849 0.599 model6 1:1 0.5 0.793 0.318 0.682 0.905 0.849 0.593 model7 1:1 0.5 0.805 0.301 0.699 0.911 0.858 0.616 model8 1:1 0.5 0.804 0.309 0.691 0.916 0.860 0.619 model9 1:1 0.5 0.784 0.326 0.674 0.895 0.839 0.570 model10 1:1 0.5 0.809 0.294 0.706 0.911 0.860 0.623 Models Average 1:1 0.5 0.799 0.307 0.693 0.906 0.853 0.604

Against Training Data Model Weights threshold AUC Ommis.rate Sensitivity Specificity prop.correct kappa model1_weighted 1:3 0.5 0.837 0.114 0.886 0.788 0.812 0.574 model2_weighted 1:3 0.5 0.840 0.116 0.884 0.795 0.817 0.582 model3_weighted 1:3 0.5 0.841 0.116 0.884 0.797 0.819 0.585 model4_weighted 1:3 0.5 0.846 0.114 0.886 0.807 0.826 0.599 model5_weighted 1:3 0.5 0.845 0.110 0.890 0.800 0.822 0.592 model6_weighted 1:3 0.5 0.839 0.118 0.882 0.796 0.817 0.582 model7_weighted 1:3 0.5 0.845 0.116 0.884 0.806 0.825 0.597 model8_weighted 1:3 0.5 0.848 0.105 0.895 0.801 0.825 0.599 model9_weighted 1:3 0.5 0.826 0.139 0.861 0.791 0.809 0.561 model10_weighted 1:3 0.5 0.844 0.116 0.884 0.803 0.824 0.594 Models Average 1:3 0.5 0.841 0.116 0.884 0.798 0.820 0.587

50

Table 9. AUC results for unweighted and weighted GLM models using partial training data.

Against Training Presences + Random Background Pseudo-absences Data Weights threshold AUC Ommis.rate Sensitivity Specificity prop.correct kappa model1 1:1 0.5 0.847 0.281 0.719 0.976 0.847 0.695 model2 1:1 0.5 0.818 0.343 0.657 0.979 0.818 0.637 model3 1:1 0.5 0.826 0.311 0.689 0.963 0.826 0.652 model4 1:1 0.5 0.831 0.294 0.706 0.957 0.831 0.663 model5 1:1 0.5 0.828 0.298 0.702 0.953 0.828 0.655 model6 1:1 0.5 0.825 0.318 0.682 0.968 0.825 0.650 model7 1:1 0.5 0.828 0.301 0.699 0.957 0.828 0.655 model8 1:1 0.5 0.836 0.309 0.691 0.981 0.836 0.672 model9 1:1 0.5 0.817 0.326 0.674 0.961 0.817 0.635 model10 1:1 0.5 0.838 0.294 0.706 0.970 0.838 0.676 Models Average 1:1 0.5 0.829 0.307 0.693 0.966 0.829 0.659

Against Training Presences + Random Background Pseudo -absences Models Average Weights threshold AUC Ommis.rate Sensitivity Specificity prop.correct kappa model1_weighted 1:3 0.5 0.903 0.114 0.886 0.919 0.903 0.805 model2_weighted 1:3 0.5 0.899 0.116 0.884 0.914 0.899 0.798 model3_weighted 1:3 0.5 0.901 0.116 0.884 0.918 0.901 0.801 model4_weighted 1:3 0.5 0.895 0.114 0.886 0.904 0.895 0.790 model5_weighted 1:3 0.5 0.897 0.110 0.890 0.904 0.897 0.794 model6_weighted 1:3 0.5 0.894 0.118 0.882 0.906 0.894 0.788 model7_weighted 1:3 0.5 0.901 0.116 0.884 0.918 0.901 0.801 model8_weighted 1:3 0.5 0.927 0.105 0.895 0.959 0.927 0.854 model9_weighted 1:3 0.5 0.883 0.139 0.861 0.904 0.883 0.766 model10_weighted 1:3 0.5 0.906 0.116 0.884 0.929 0.906 0.813 Models Average 1:3 0.5 0.901 0.116 0.884 0.918 0.901 0.801

51