12-001.qxd 12/18/12 6:17 PM Page 25

Spatio-statistical Predictions of Vernal Pool Locations in : Incorporating the Spatial Component into Ecological Modeling

Tina A. Cormier, Russell G. Congalton, and Kimberly J. Babbitt

Abstract 1998). In addition to biodiversity functions, vernal pools Vernal pools are small, isolated, depressions that experience also promote flood control, improve water quality, and cyclical periods of inundation and drying. Many species stabilize soils by intercepting sediment, nutrient rich run-off, have evolved strategies to utilize the unique characteristics and precipitation (Leibowitz, 2003; Wolfson et al., 2002). of vernal pools; however, their small size, seasonal nature, For most purposes, including protective legislation, and isolation from other, larger water bodies, suggest vernal pools are primarily defined by the wildlife found increased risk of damage or loss by development. The within them, rather than by their physical features (i.e., objectives of this research were to statistically determine obligate and facultative species) (Table 1 and Table 2). Most physical predictors of vernal pool presence, and subse- pools, however, have some basic physical attributes in quently, to represent the output cartographically for use as common: they are small, depressional basins; they are a conservation tool. Logistic regression and Classification geographically isolated from other wetlands; and they and Regression Tree (CART) methods were used to identify exhibit cyclical periods of inundation and drying. Their important predictors of 405 known vernal pools across small size, seasonal nature, and isolation from other, larger northeastern Massachusetts. The CART models performed water bodies, suggest increased risk of degradation or loss most favorably, achieving map accuracies as high as by development, as they are often left unprotected under 97 percent and providing a set of rules for vernal pool wetland legislation (Calhoun and deMaynadier, 2008). prediction. It is important to note that we observed signifi- The vulnerability of vernal pools to development and cant discrepancies between model accuracy and map the fragmentation of adjacent uplands have led to extensive accuracy, illustrating the pitfall of relying on statistical efforts to identify them in the landscape. Massachusetts, in metrics alone (e.g., R2 values) to assess the quality of spatial particular, has been a pioneer in accepting the difficult analyses. issues surrounding vernal pool identification and protection; it was one of the first states in the nation to pass regulations that specifically protect vernal pool habitat (Burne and Introduction Griffin, 2005). While legislation is an important step in the process of Motivation safeguarding vernal pools, a complete inventory of vernal Vernal pools are ephemeral wetlands that are biologically pool locations across the landscape is critical to begin active primarily during the spring and summer months. effective enforcement of these regulations. Until 2001, vernal Many amphibian and invertebrate species have evolved life pool identification in Massachusetts relied almost exclusively history strategies to utilize the distinctive characteristics of upon discovery and certification by interested citizens, vernal pools and the surrounding uplands. These unique resulting in patchy distributions of known pools. Early efforts ecosystems contribute disproportionately to the biodiversity to identify vernal pools on a regional scale included photoint- of the landscape by providing a range of wetland habitat for erpretation of infrared aerial photography in the Quabbin a variety of rare and endangered species. Despite their small Reservoir in central Massachusetts by Brooks et al. (1998). size, the significance of vernal pools in maintaining the An intensive effort was made to more completely identify diversity of the overall landscape is large (Calhoun and potential vernal pools on a statewide scale using photointer- deMaynadier, 2008; Leibowitz, 2003; Semlitsch and Bodie, pretation in 2001 (Burne, 2001). While photointerpretation is considered to be relatively fast and effective for pool detec- tion across the landscape, there are other methods that may prove to be more time and cost efficient. For example, Frohn Tina A. Cormier is with the Woods Hole Research Center, et al. (2009) used object-oriented classification of Landsat-7 149 Woods Hole Rd., Falmouth, MA 02540, and formerly with the Department of Natural Resources & the Environment, 114 James Hall, University of , Durham, NH 03824. Photogrammetric Engineering & Remote Sensing Vol. 79, No. 1, January 2013, pp. 25–35. Russell G. Congalton and Kimberly J. Babbitt are with the Department of Natural Resources & the Environment, 0099-1112/13/7901–25/$3.00/0 114 James Hall, University of New Hampshire, Durham, © 2013 American Society for Photogrammetry NH 03824 ([email protected]). and Remote Sensing

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING January 2013 25 12-001.qxd 12/18/12 6:17 PM Page 26

TABLE 1. OBLIGATE VERNAL POOL SPECIES; TABLE ADAPTED FROM and location specific (Vogiatzakis, 2003). Therefore, ecologi- COMMONWEALTH OF MASSACHUSETTS DIVISION OF FISHERIES AND WILDLIFE (2001) cal problems lend themselves well to the use of Geographic Information Systems (GIS) and remotely sensed imagery. Most MA Breeding Obligate Species commercial spatial analysis software packages, however, lack Wood Frog (Rana sylvatica) the statistical capabilities necessary to examine complex Spotted Salamander (Ambystoma maculatum) modeling problems, while most statistical software lacks Blue-spotted Salamander (Ambystoma laterale)** important spatial components (Vogiatzakis, 2003). For these Jefferson Salamander (Ambystoma jeffersonianum)** reasons, many studies have been limited to examining only Marbled Salamander (Ambystoma opacum)** one (the spatial or the statistical) part of an ecological Eastern Spadefoot Toad (Scaphiopus holbrookii)** problem. Fairy Shrimp (Eubranchipus spp.) The main objective of this study is to statistically determine the physical predictors of vernal pool occurrence **State Listed Species in Massachusetts, and to use the statistical models to generate maps, or cartographic/spatial models, for use as conservation tools. Logistic regression and Classification And Regression Tree (CART) analyses were chosen as the TABLE 2. FACULTATIVE VERNAL POOL SPECIES; TABLE ADAPTED FROM statistical modeling approaches. Quantitative assessments COMMONWEALTH OF MASSACHUSETTS DIVISION OF FISHERIES AND WILDLIFE (2001) were performed that compared not only the statistical results, but the cartographic results as well. MA Facultative Species Amphibians Methods Breeding Spring Peeper (Pseudacris crucifer) Study Area Breeding Gray Tree Frog (Hyla versicolor) The study area is in the State of Massachusetts. Massachu- Breeding American Toad (Bufo americanus) Breeding Fowler's Toad (Bufo fowleri) setts is the most populous state in , although Breeding Green Frog (Rana clamitans melanota) only about 10 percent of the state is developed (MassGIS, Breeding Pickerel Frog (Rana palustris) 2002). The climate in Massachusetts is temperate with mild, Breeding Leopard Frog (Rana pipiens) humid summers and cold, snowy winters (NOAA National Adult or Breeding Red-spotted Newt (Notophthalmus v. viridescens) Climatic Data Center, 2005). Over half of the state is forested Breeding Four-toed Salamander (Hemidactylium scutatum)** (MassGIS, 2002), the land cover type most commonly associated with vernal pools. Forests in Massachusetts are Reptiles generally classified as “Deciduous Forest Land” and/or “Mixed Forest Land” (Anderson et al., 1976). Deciduous Spotted Turtle (Clemmys guttata)** Blandings Turtle (Emydoidea blandingii)** forests in Massachusetts are most often composed of the Wood Turtle (Clemmys insculpta)** following tree species: red maple (Acer rubrum), oak Painted Turtle (Chrysemys picta) (Quercus spp.), birch (Betula, spp.), and American beech Snapping Turtle (Chelydra serpentina) (Fagus grandifolia). The most common evergreens in Massachusetts are eastern hemlock (Tsuga canadensis) and Invertebrates white pine (Pinus strobus).

Predaceous Diving Beetle Larvae (Dytiscidae) Training and Validation Study Sites Water Scorpion (Nepidae) Dragonfly Larvae (Odonata: Anisoptera) Training and validation sites were chosen by analyzing the Damselfly Larvae (Odonata: Zygoptera) Certified Vernal Pool (CVP) layer across Massachusetts Dobsonfly Larvae (Corydalidae) (National Heritage and Endangered Species Program, 2002). Whirligig Beetle Larvae (Gyrinidae) This statewide layer was searched for assemblages of pools Caddisfly Larvae (Trichoptera) with similar geography and vernal pool density to represent Leeches (Hirudinea) training and validation groups. Convex hulls were generated Freshwater (fingernail) Clams (Pisidiidae) around each assemblage (Jenness, 2004) to create eight Amphibious, Air-breathing Snails (Basommatophora) representative areas: four training groups and four validation groups for each of the models (Table 3). The training areas **State Listed Species totaled 9,145 ha, and the validation areas totaled 8,911 ha. The sites were generally located in the northeastern part of the state, as the number and density of known pools was higher in these areas. The sites covered parts of Essex imagery to classify individual, isolated wetlands larger than County, eastern Middlesex County, northern Middlesex 0.20 ha. Similarly, a spatial-statistical (GIS) modeling approach County, and eastern Worcester County (Figure 1). may provide an objective and less labor-intensive solution for preliminary identification of potential vernal pool locations Modeling Framework over large geographic areas, while at the same time revealing Guisan and Zimmermann’s (2000) modeling framework was important ecological and spatial characteristics. utilized in this study, whereby a conceptual model was formulated based upon potential model inputs (i.e., identify- Habitat Modeling ing possible predictors of vernal pool presence). Potential Statistical modeling of environmental processes, ecosystem predictors were chosen from information gathered during dynamics, and species and habitat distributions is a common extensive literature review and from field experience approach for acquiring information about rare, declining, or (Figure 2). Statistical models were then identified, tested, otherwise important species or ecosystems. The ecological calibrated, and evaluated using the training and validation data used to compile such models are generally multivariate data sets.

26 January 2013 PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING 12-001.qxd 12/18/12 6:17 PM Page 27

TABLE 3. TRAINING AND VALIDATION AREA DESCRIPTIONS

ID Town # CVPs Area (Acres) Density (pools/acre) Description

Train 1 Boxford 59 4663 0.013 Northeast Massachusetts, Validation 1 N. Andover 63 3564 0.018 Essex County. Train 2 Georgetown 71 4879 0.015 Northeast Massachusetts, Validation 2 Reading 69 5527 0.012 Middlesex & Essex Counties. Train 3 S. Westford 44 7578 0.006 Northern Massachusetts, Validation 3 N. Westford 44 5951 0.007 Middlesex County. Train 4 Sterling 40 5478 0.007 Central Massachusetts, Validation 4 Bolton 39 6978 0.006 Worcester County

Figure 1. Model training and validation areas used for model creation, calibration, and evaluation. Lower right inset map depicts field validation areas.

Data known vernal pools that have been documented and In support of the conceptual model in Figure 2, a number of certified by the NHESP (National Heritage and Endangered GIS data layers were gathered in order to compile physical Species Program, 2002). The locations of these pools were information about known vernal pools. Based on data assumed to be certain; thus they were used as the dependent relevance and availability, a total of eight independent variable in the modeling exercises. Overall, there were 198 variables were tested in this study: Blue light reflectance, CVP points used as training data, and 205 CVP points used green light reflectance, red light reflectance, slope, aspect, for model validation. land use, soil type, and soil drainage class (Table 4). These Additionally, the NHESP Potential Vernal Pool (PVP) variables were chosen because of their perceived importance layer was also utilized. This layer represents unverified in predicting vernal pool locations, and their ready availabil- vernal pools identified by photo interpretation of ity in spatial (GIS) format. 1:12 000 color-infrared, spring, leaf-off aerial photography All layers were projected to Massachusetts State Plane, (as discussed in Burne, 2001). The layer is comprised of NAD83, meters. Of utmost importance was the National more than 29,000 potential pools (National Heritage and Heritage and Endangered Species Program (NHESP) Certified Endangered Species Program, 2000), and was used as part Vernal Pool (CVP) point layer. It was chosen to represent of the model validation.

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING January 2013 27 12-001.qxd 12/18/12 6:17 PM Page 28

Figure 2. Conceptual model of potential vernal pool predictors. Symbols “ϩ” and “Ϫ“ indicate positive or negative correlations to vernal pool presence.

For model building and validation, vernal pool absence field knowledge of the study area and photo interpretation was as important as vernal pool presence (Barbet-Masin of MassGIS 1:5 000 (0.5 m) color orthophotos. Slope, et. al., 2012). Both logistic regression and Classification and National Wetlands Inventory (NWI), and land use were used Regression Tree analyses require identification of the as supplemental layers for decision-making. Areas with conditions under which the dependent variable is present as similar characteristics as vernal pools (e.g., other wetlands, well as absent. A point layer containing 400 “absent” points shadows, etc.) were not chosen as absent sites, as initial was created: fifty points were chosen in each of the training testing indicated with reasonable certainty that the modeling and validation sites. Points were selected based upon both techniques and inputs used in this study would not be able

TABLE 4. VERNAL POOL PREDICTOR LAYERS

Layer* Resolution/MMU† Notes

True Color Ortho Imagery 0.5 m Three bands, flown in April, 2001. Slope 5 m Calculated in degrees. Aspect 5 m Aggregated into 4-classes (Forest, Wetland, Field/ Open, Developed) and 5-classes (Forest, Wetland, Field/Open, Low-density Development, Land Use (1999) 1 acre (0.4067 ha) High-density Development). Aggregated classes: fine sandy loam, loamy, USDA - NRCS Soil Type 3 acres (1.21 ha) loamy sand, sandy, muck, urban, rock outcrop. Aggregated classes: excessively drained, well-drained, USDA - NRCS Soil Drainage 3 acres (1.21 ha) poorly drained, and very poorly drained.

*All layers acquired from MassGIS. † Minimum Mapping Unit

28 January 2013 PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING 12-001.qxd 12/18/12 6:17 PM Page 29

to distinguish these types of points from vernal pools. Since The second statistical modeling routine was Classifica- the objective of the modeling exercise was to identify as tion and Regression Tree (Breiman et al., 1984). CART was many potential vernal pools as possible, we were willing to chosen as a modeling technique because of its ability to accept more commission errors (i.e., falsely predicting non- handle nonparametric data and both continuous and vernal pools (e.g., shadows, other wetlands, etc.) while categorical variables. Further, it is especially appropriate for making fewer errors of omission. In other words, the risk of spatial translation, as prediction rules can be directly overlooking potentially critical vernal pools outweighed the induced from the model results (Guisan and Zimmerann, benefit of labeling non-vernal pools as vernal pools. 2000), and hierarchical relationships between independent variables are explicitly illustrated from the tree structure Statistical Analyses (North et al., 1999). CART models recursively split predictor This study was performed on two levels. First, strictly variables into a hierarchical sequence of groups based upon statistical models (with no spatial component) were created to the independent variables’ ability to predict the response explain the conditions under which vernal pools are typically (Andersen et al., 2000). The routine iteratively splits the found in the landscape. Logistic regression and Classification data into two groups, based upon the variable that most and Regression Tree modeling techniques were used. These minimizes the deviance in the dependent variable, until statistical models were developed using the training data and finally resulting in predictions at the terminal nodes tested using the validation data as described below. Once the (Iverson and Prasad, 1998; Lawrence and Wright, 2001). final statistical models were selected, they were translated With only eight independent variables, and since CART into cartographic models or maps. These maps were then examines all explanatory variables at each step, it was less spatial representations of the statistical models. While the critical to predetermine which predictor variables were accuracy of both the statistical models and cartographic considered for the model. For this reason, all variables were models/maps were expected to be similar, assessments were added to the model: slope, aspect, the red, green, and blue conducted on both to test that assumption. bands of imagery, land use (category 5 performed best), soil The logistic regression modeling technique was utilized type, and soil drainage. in an attempt to build upon the work of Grant (2005). Logistic The CART analyses were completed in Sϩ. The defaults regression is a specific type of Generalized Linear Model of the Sϩ CART modeling routine were maintained, mean- (GLM) in which the dependent variable is binomial. GLMs are ing that splits occurred only if there were more than five more flexible than simple linear models because they are observations in a node before a split, and terminal nodes appropriate for parametric and non-parametric data. Modeling were achieved when either the total number of observations routines that handle nonparametric data are often better for a particular node was less than ten, or when the suited for analyzing ecological relationships than methods deviance of the node was less than 1 percent of the total that assume a classical Gaussian distribution (Guisan et al., tree deviance. Since an unrestricted CART analysis will 2002; Guisan and Zimmermann, 2000; Lehmann, 1998). generally over-fit the model to even the slightest variations Logistic regression uses a logistic link (logit transformation) and noise specific to the training data set, cost complexity that can fit polynomial equations to non-linear data (Hirzel and cross-validation pruning methods were tested in an et al., 2001). It allows the user to predict a discrete outcome attempt to make the models more robust. Both methods (i.e., presence/absence) from a set of categorical or continuous yielded similar results because the classification trees were predictors (Guisan et al., 2002; Lehmann, 1998), though it has relatively simple. Further, there was no scenario in which difficulty modeling complex interactions between variables. the deviance remained similar to the original tree, and at the In order to determine which combination of the eight same time, the number of end nodes significantly decreased, independent variables would provide the best model, meaning that pruning would have reduced the explanatory Akaike’s Information Criterion (AIC) was computed for each power of the analysis. After exploring multiple pruning model (Akaike, 1979; Bonney, 1987). The choice not to scenarios and testing each output on the independent automatically include all variables into the model was made validation set, the tree with the fewest misclassification to ensure that both the simplest, and the most effective errors on the validation set was chosen: the final tree model was selected from the numerous possible models. remained unpruned. It is well documented that as the number of parameters Like the logistic regression, a prediction table was increases, the accuracy with which they can be predicted output based upon the statistical results of the analysis. decreases (Bonney, 1987). It was evaluated in the same manner as the logistic regres- Specifically, more predictor variables mean increased sion output; a successful prediction was determined by a likelihood of multicollinearity (Muñoz and Felicísimo, 50 percent or higher probability. 2004). In order to reduce the probability of such effects, a stepwise logistic regression was performed on the training Cartographic Modeling data set using SAS 9.1. Parameters for entry into the model In landscape scale, spatial-environmental modeling reliance were relaxed (significance level .99) to ensure that all on statistical analyses alone is not sufficient. For manage- independent variables could enter the model and AIC could ment purposes especially, the process of integrating the be assessed at each level. Liberal entry parameters were statistical analyses into a spatial context (i.e., map) is very used for data exploration only (see Hosmer and Lemeshow important. Understanding if/how certain statistics translate (2000) for examples). The model with the smallest AIC value onto the landscape is the crux of environmental modeling, was the best-fit model and included the green band from the and it is often overlooked. In this study, both the logistic aerial imagery, land use (four categories), and slope. regression model and the CART model were used to create Based upon the findings from the stepwise logistic predictive maps of vernal pools across the validation study regression, the three independent variables indicated to sites. The cartographic outputs/maps were compared to the produce the best model were entered into the SAS PROC statistical ones and differences in accuracies were recorded, LOGISTIC routine. A prediction table was generated for the if present. independent validation set of CVPs, with each pool assigned Using the results from the PROC LOGISTIC model, a a probability value ranging between 0 to 100 percent. A cut- map of the model output was created. It was weighted based off value of 50 percent was used to evaluate predictive upon the strength of the correlation between predictor success; Grant (2005) calculated 53 percent as a threshold. variables and vernal pool locations (Sperduto and Congalton,

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING January 2013 29 12-001.qxd 12/18/12 6:17 PM Page 30

1996). Weighting of the independent variables was based on or had evidence of flooding. We did not require presence of the maximum likelihood coefficients in the logistic regres- vernal pool species, like the Massachusetts definition, but sion output. The inverse logistic transformation was used to considered the physical characteristics and the potential to create a map with probability values ranging between 0 to 1 serve as breeding habitat for vernal pool species. The model (Equation 1) (Guisan and Zimmermann, 2000): output polygon metrics are not exact measures of user’s accuracy, because multiple polygons overlapping a single exp(LP) PVPO ϭ (1) vernal pool are all correct predictions (i.e., the number of (1 ϩ exp(LP)) correct polygons is not a proxy for the number of vernal pools detected in the landscape); however, these data where PVPO is the Probability of Vernal Pool Occurrence, provided an estimate of commission error that was more and LP is the linear predictor fitted by the logistic regression. representative than using the validation data alone. Using the three input layers (green light reflectance, land use, and slope), the spatial equation was performed in ArcGIS®. The resulting map represented the probability of vernal pool Results occurrence across the validation study areas. Probabilities were classified as: (a) Low (0 to 49 percent), and (b) High Logistic Regression Statistical Results (50 to 100 percent). The output resolution was 5 m. The logistic regression model produced a maximum rescaled 2 Similar to the logistic regression outputs, predictive R value of 0.85. The Akaike’s Information Criterion and the Ϫ maps were created for the CART analysis. The CART maps 2 Log Likelihood both decreased significantly with the were much more complex, as they required a query for each addition of the three independent variables to the model, node on the tree that lead to a “present” prediction. Each indicating that the predictor variables improved the overall Ͻ node effectively represented a rule for determining the model fit (Table 5). The Chi Square value (p 0.0001) and presence or absence of vernal pools. Queries were written the Hosmer-Lemeshow Goodness of Fit Test value (0.7980) from the root node, through a series of intermediate nodes, further illustrated that the addition of the chosen covariates to each terminal node. The result was a 5 m map predicting to the model significantly improved the overall model fit. vernal pool presence. The “Analysis of Effects” indicated that all three variables were significant in predicting the presence of Cartographic Model Validation vernal pools (p Ͻ.05). The odds ratios provided a method of Error analysis of spatial data means computing overall map describing the nature and strength of the relationship accuracy as well as individual class accuracies. Class between each predictor variable and the presence of vernal accuracies are assessed by calculating producer’s and user’s pools. The odds ratios for slope and band 2 (green) of the accuracies, or quantifying errors of omission and commis- imagery were less than one (0.789 and 0.959, respectively), sion, respectively (Story and Congalton, 1986; Congalton and indicating an inverse relationship between each of these Green, 2009). Producer’s accuracy refers to how well the map variables and vernal pool occurrence (Table 6). The sign of is able to predict an independent reference data set. User’s the maximum likelihood estimates for both slope and green accuracy is the probability that a sample from the map light reflectance further supported the inverse relationship actually represents the same category on the ground. Calcu- revealed by the analysis of effects: both were negative. lating both types of class accuracy is imperative for full The odds ratios of the land use categories were inter- understanding of the quality of a map. For this study, overall preted differently than the continuous variables. In logistic and producer’s accuracies were calculated using error regression, categorical variables are divided into dummy matrices. Due to the method of absent-point selection in this variables, with one class acting as the reference class to study, the user’s accuracy, as classically calculated in an error matrix, is not applicable. Because certain types of areas were purposely avoided (i.e., shadows and other wetlands), TABLE 5. LOGISTIC REGRESSION MODEL FIT STATISTICS the user’s accuracy would not be a true representation of the likelihood that a given vernal pool from the map in fact Model Fit Statistics - Best Model represents a vernal pool on the ground. Therefore, in order to achieve some estimate of user’s accuracy, we conducted a Criterion Intercept Only Intercept and Covariates field validation of a subset of the study area. AIC 552.356 156.658 Field Effort –2 Log Likelihood 550.356 144.658 Despite having ample validation data, field checking the model outputs for a portion of the validation area was warranted both in order to more fully understand model errors and to validate new pools identified by the models TABLE 6. LOGISTIC REGRESSION MAXIMUM LIKELIHOOD ESTIMATES that were not in the validation data set. We were especially interested in errors due to shadows or other features often Maximum Likelihood Estimates confused with vernal pools, since these features were not identified as absent points in the training data and would Effect Coefficient Significance not otherwise be evident in the accuracy assessment. Field validation of model outputs was completed on a Intercept 3.1211 <0.0001 subset of the total study area. The validation polygons were Slope –0.2372 0.0003 randomly subset into four rectangular field validation sites Green Light Reflectance –0.0423 <0.0001 per aggregation and totaled 10 percent of the total validation Land use - Development –1.2268 0.0087 area (891 ha) (see inset of Figure 1). Each model output Land use - Forest 1.5446 0.0003 Land use - Field/Open 0.2268 0.6650 polygon was visited in the field. An area was considered a Land use - Wetland 1* N/A vernal pool if it was a confined depression with no obvious, permanent inlet or outlet of surface water and was flooded *Reference variable to which all other categorical variables are compared.

30 January 2013 PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING 12-001.qxd 12/18/12 6:17 PM Page 31

TABLE 7. ERROR MATRIX FOR LOGISTIC REGRESSION STATISTICAL RESULTS

Reference Data Vernal Pool Absent Total User's Accuracy Vernal Pool 176 9 185 N/A Absent 29 191 220 N/A Total 205 200 405 Classification Producer's Accuracy 85.85% 95.50% Overall Accuracy 90.62%

TABLE 8. ERROR MATRIX FOR CART STATISTICAL RESULTS

Reference Data Vernal Pool Absent Total User's Accuracy Vernal Pool 190 18 208 N/A Absent 15 182 197 N/A Total 205 200 405 Classification Producer's Accuracy 92.68% 91.00% Overall Accuracy 91.85%

which all other classes are compared. In this regression, the where LP ϭ 3.211 Ϫ 0.2372*(Slope) Ϫ 0.0423*(Band 2) Ϫ “wetland” class was the reference category. The most 1.2268*(Developed Land) ϩ 1.5446*(Forested Land) ϩ dramatic instance of vernal pool presence occurred between 0.2266*(Field/Open Land) ϩ 1*(Wetland). forested land and wetlands. A vernal pool was 8.1 times Weighting was based upon the sign and magnitude of more likely to occur in a forested area than in a wetland the logistic regression coefficients (Table 6). Output maps area (Table 6). Further, a vernal pool was 2.2 times more were generated for each of the four validation study areas likely to occur in open lands than in wetlands, but only half (e.g., Figure 3). NWI wetlands were removed from the model as likely to occur on developed land. These results generally output because the goal of the study was to map vernal indicated that vernal pools were most commonly found in pools and not permanent wetlands. In addition, for conser- forested areas and negatively associated with development. vation purposes, predicting vernal pools in wetland areas An error matrix was generated from the independent was considered unnecessary, as larger wetlands are already validation predictions (Table 7) which yielded an overall safeguarded through wetland protection legislation. accuracy of 90.6 percent. Of greatest interest was that 85.9 Like the statistical models, error matrices were used to percent, or 176 of 205 certified vernal pools, were correctly evaluate cartographic model performance (Congalton and predicted. As described above, the user’s accuracy is not Green, 2009) (Table 9). The weighted logistic regression applicable with regard to the validation data and was cartographic model correctly predicted 111/205 vernal pool instead assessed in the field. points (54.2 percent). The overall accuracy of this model was 74.32 percent. CART Statistical Results The CART model had 20 terminal nodes, of which nine CART Cartographic Modeling Results predicted vernal pool presence. Like the logistic regression The CART model was interpreted as a spatial combination of routine, the CART model’s performance was also evaluated each of the nine terminal nodes predicting vernal pool on an independent validation set using an error matrix presence and is shown in Figure 4. (Table 8). It had an overall accuracy of 91.9 percent (372 of The spatial evaluation of the CART analysis was largely 405 points) and correctly predicted 92.7 percent (190 of 205) based upon the model’s ability to predict the independent of validation pools. set of validation pools. The CART model was able to cor- rectly predict 199/205 CVP validation pools (97.1 percent) Logistic Regression Cartographic Modeling Results and 188/200 absent points (94 percent), which represents an The inverse logistic transformation (Equation 1) was used to overall accuracy of 95.56 percent (Table 10). create a vernal pool probability map from the logistic regression statistical output: Field Effort exp(LP) The field effort was completed to determine if the predicted ϭ PVPO ϩ (1) areas on the map were representative of what was on the (1 exp(LP)) ground (user’s accuracy) (Congalton and Green, 2009). Since

TABLE 9. ERROR MATRIX FOR LOGISTIC REGRESSION CARTOGRAPHIC MODEL

Reference Data Vernal Pool Absent Total User's Accuracy Vernal Pool 111 10 121 N/A Absent 94 190 284 N/A Total 205 200 405 Classification Producer's Accuracy 54.15% 95.00% Overall Accuracy 74.32%

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING January 2013 31 12-001.qxd 12/18/12 6:17 PM Page 32

Figure 3. Logistic regression weighted cartographic model: North Andover study. site.

TABLE 10. ERROR MATRIX FOR CART CARTOGRAPHIC RESULTS

Reference Data Vernal Pool Absent Total User's Accuracy Vernal Pool 199 12 211 N/A Absent 6 188 194 N/A Total 205 200 405 Classification Producer's Accuracy 97.07% 94.00% Overall Accuracy 95.56%

the logistic regression mapping exercise produced unfavor- approach to create a cost and time efficient method of able results (54 percent accuracy predicting vernal pools), inventorying vernal pools. These techniques improve upon only the CART model was evaluated in the field. As past identification methods by focusing photo interpreta- expected, there was high commission error, as many areas of tion and field efforts in areas where vernal pools are likely shadow were misclassified as vernal pools (Table 11). The to exist. The methods chosen to model vernal pool loca- spectral similarity between shadow and water complicates tions were logistic regression and classification and the analysis and is inevitable, given the input variables. regression tree analysis. During cartographic implementa- tion (map generation) of the statistical models, differences in statistical versus spatial accuracies were observed. Both Discussion logistic regression and CART had favorable statistical The goal of this project was to use a combination of results; however, logistic regression’s cartographic results spatial data and remotely sensed imagery in a modeling were far inferior to CART’s.

32 January 2013 PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING 12-001.qxd 12/18/12 6:17 PM Page 33

Figure 4. CART cartographic model: North Andover study site.

Limitations and Model Improvements When viewed within the paradigm of modeling vernal The CART model, while extremely successful at predicting pools for conservation purposes, the commission error vernal pool locations (low omission error), was not without becomes less problematic. These models should be regarded limitations. It had very high commission error (low user’s as tools for preliminary identification of vernal pools to accuracy), meaning that there were non-vernal pools falsely facilitate and focus field or other investigations. With this identified by the model. The majority of the commission goal in mind, a deeper examination of the severity of errors were due to confusion with shadows; these errors are making commission errors is possible. Assuming that the not evident in the error matrices because shadows were results obtained from the field validation areas (891 ha) are purposely avoided as absent points. Model validation in the applicable to the entire validation area (approximately field indicated that the approximate commission error was 8,911 ha), the model output within the whole validation about 68 percent. While high commission error is never area can be evaluated. In the CART model, the total land desirable, we were willing to accept these errors for the sake area representing predicted vernal pools was 1,535 ha, or of low omission error, rather than overlook potentially about 17 percent of the total validation area. While this area critical vernal pools (Muñoz and Felicísimo, 2004). sounds large, it eliminates 83 percent of the total land area as unsuitable for vernal pool presence. Further, in keeping with the definition of vernal pools as isolated from other surface water sources, and for most conservation purposes, TABLE 11. USER’S ACCURACY ESTIMATE BASED ON FIELD VALIDATION; it is redundant to assess areas already identified by the A SMALL PERCENTAGE OF POOLS WERE INACCESSIBLE IN THE FIELD National Wetlands Inventory. Removal of NWI areas from the analysis resulted in a total land reduction of 88 percent, % Correct % NWI % Incorrect with 1,106 ha (of the original 8,911 ha) of model area 5.38 26.44 68.07 remaining for photo or field identification. Again, while a large percentage of the total model output was likely

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING January 2013 33 12-001.qxd 12/18/12 6:17 PM Page 34

committed to the wrong category, the model output itself statistical output (i.e., rules) and the cartographic output was only a small fraction of the total validation area. Since (i.e., logical queries), allowing easier translation from the the vast majority of the commission errors in this project statistical environment to the spatial one. were attributable to shadows, simple photo interpretation Statistical model accuracy is commonly assumed to can quickly eliminate many erroneous polygons to further translate directly to spatial accuracy; thus, there are many reduce searchable land area. spatially explicit studies that report only statistical results. In addition to improving commission error, a deeper This assumption can prove to be financially and environ- investigation into omission error should be conducted to mentally costly. The goal of habitat suitability/distribution more fully evaluate the efficacy of using models such as the models, for example, is often to inform land use manage- ones described in this study. Arguably, omission errors are ment and policy regarding a particular habitat. Predictive the most consequential in conservation models. Since the maps based on statistical models often serve as the basis for study areas chosen in Massachusetts can be generally designing habitat protection strategies, restoration, research, characterized as mixed/deciduous forest, one of the most and other activities. The maps are frequently the most unfortunate limitations to this model is that it does not have valuable and useful results, as they are used to make the ability to identify pools that are beneath a thick tree important, and often costly, planning decisions; yet map canopy. For this reason, spring-leaf off imagery is most accuracy is rarely assessed. effective in this type of analysis. Of course, even with the The importance of determining if and how the statistical optimal imagery, those pools beneath dense coniferous results translate into spatial ones should not be overlooked. canopies are still undetectable and are a significant source For instance, in this study, the logistic regression routine of unaccounted-for omission error. Several authors have produced a strictly statistical accuracy of about 86 percent. observed large errors of omission in vernal pool estimates Had the modeling process ended at this step, the project derived from aerial photo interpretation. For example, would have been considered extremely successful; however, Calhoun et al. (2003) estimated at least 27 percent omission the cartographic model was only about 54 percent accurate error in mixed/deciduous forests (white pine, hemlock, red in predicting the same validation pools. Without this vital maple, red oak), much like the forests of northern Massachu- second step, valuable time and resources may have been setts. Van Meter et al. (2008) estimated that only 1.4 percent spent trying to implement this model in a real world of the total number of vernal pools identified in their study application. area, using probabilistic field sampling, were also identified by photo interpretation and noted that almost all of the pools were under 100 percent canopy cover. While our Conclusions study did not explicitly use photo interpretation for identifi- The results of this study indicate that there is a correlation cation of vernal pools, the empirical models were largely between vernal pools and the physical and spectral charac- based upon image reflectance values. One option for teristics at vernal pool locations: slope, aspect, land use, soil improved, remote prediction of vernal pools below dense type, and spectral reflectance were investigated. Overall, the canopy cover is to add radar images to the models. cartographic outputs of the CART models had the highest Bourgeau-Chavez et al. (2001) found that forested wetlands accuracies both statistically and cartographically, and have were detectable with L- and C- band Synthetic Aperture the potential to be used in similar geographic regions for the Radar (SAR). detection of vernal pools. Information interpreted from models, often including Statistical versus Spatial Modeling maps, is used to advise land managers and policy makers, While both logistic regression and CART had favorable making accurate representation of model data crucial. Given statistical results, their spatial counterparts did not perform the importance of translating statistical models into products as expected. The logistic regression functioned much better suitable for decision-making, appropriate model selection is statistically than it did cartographically. The overall model also critical. The logistic regression performed well statisti- had an R2 value of 0.85. It was also successful at statisti- cally, but the results were an over-simplification of the cally predicting the independent validation data set. relationships between vernal pool presence and the input Cartographically, however, the model did not perform as predictors that could not be easily converted into a GIS; well. The inability of logistic regression to capture complex therefore, its spatial model accuracy was much lower than and non-additive interactions between variables hindered expected. The CART models, however, were much more accurate statistical interpretation and map generation. conducive to cartographic modeling. They produced a Consequently, very general information about the physical specific rule set that was directly queried in a GIS; there was characteristics of vernal pools was exposed (e.g., vernal no subjective interpretation of the results. The CART model pools are negatively correlated with slope), but more was much better-suited for this analysis than the logistic intricate details were over-simplified. At the 50 percent regression. threshold, the model statistically predicted 90 percent of The results of this study indicate that model statistics, the vernal pools in the validation data set (producer’s such as R2 values, are not a proxy for map accuracy; accuracy). When mapped and spatially evaluated, the map statistical accuracy and map accuracy are not analogous was able to predict only 54 percent of the same set of in all situations. Depending upon the model approach, validation pools. providing a cartographic output while reporting only Like the logistic regression model, the CART model was statistical accuracy can be a costly misrepresentation of able to statistically predict an extremely high percentage of model results. This finding illustrates the importance of validation pools (93 percent), and it performed even better assessing both statistical and cartographic accuracy. More spatially than it did statistically (97 percent). Likely, the research is necessary to determine which, if any, modeling success of the CART spatial and statistical models was due techniques consistently produce similar statistical and map to the detailed unveiling of complex relationships between accuracies. For those approaches that have discrepant the predictor variables and the presence of vernal pools. model and map accuracies, map accuracy must always be Further, the CART routine had a 1:1 correlation between the evaluated

34 January 2013 PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING 12-001.qxd 12/18/12 6:17 PM Page 35

References Hosmer, D.W., and S. Lemeshow, 2000. Applied Logistic Regression, New York: Wiley. Akaike, H. 1979. A Bayesian extension of the minimum AIC procedure of autoregressive model Fitting, Biometrica, 66(2):237–242. Iverson, L.R., and A.M. Prasad, 1998. Predicting abundance of 80 tree species following climate change in the eastern United Andersen, M.C., J.M. Watts, J.E. Freilich, S.R. Yool, G.I. Wakefield, States, Ecological Monographs, 68(4):465–485. J.E. McCauley, and P.B. Fahnestock, 2000. Regression-tree modeling of desert tortoise habitat in the central Mojave Desert, Jenness, J., 2004. Convex hulls around points (conv_hulls_pts.avx) Ecological Applications, 10(3):890–900. extension for ArcView 3.x, v. 1.21, Jenness Enterprises, URL: http://jennessent.com/arcview/convex_hulls.htm (last date Anderson, J.R., E.E. Hardy, J.T. Roach, and R.E. Witmer, 1976. Land accessed: 21 October 2012). Use and Land Cover Classification System for Use with Remote Sensor Data, Geological Survey Professional Paper 964, Lawrence, R.L., and a. Wright, 2001. Rule-based classification , D.C. systems using Classification and Regression Tree (CART) analysis, Photogrammetric Engineering & Remote Sensing, Barber, M., M.F. Jiguet, C. Albert, and W. Thuiller. 2012. Selecting 67(10):1137–1142. pseudo-absence species distribution models: how, where, and how many?, Methods in Ecology and Evolution, 3(2):327–338. Lehmann, A. 1998. GIS modeling of submerged macrophyte distribution using generalized additive models, Plant Ecology, Bonney, G.E., 1987. Logistic regression for dependent binary 139:113–124. observations, Biotmetrics, 43:951–973. Leibowitz, S.G. 2003. Isolated wetlands and their functions: An Bourgeau-Chavez, L.L., E.S. Kasischke, S.M. Brunzell, J.P. Mudd, ecological perspective, Wetlands, 23(3):517–531. K.B. Smith, and A.L. Frick, 2001. Analysis of space-borne SAR data for wetland mapping in Virginia riparian ecosys- MassGIS, 2002 Land Use, URL: http://www.mass.gov/mgis/lus.htm tems, International Journal of Remote Sensing, 22(18): (last date accessed: 21 October 2012). 3665–3687. Muñoz, J. and A.M. Felicísimo, 2004. Comparison of statistical Breiman, L., J. Friedman, C.J. Stone, and R.A. Olshen, 1984. Classifi- methods commonly used in predictive modeling, Journal of cation and Regression Trees, New York: Chapman and Hall. Vegetation Science, 15:285–292. Brooks, R.T., 2004. Weather-related effects on woodland vernal pool National Heritage and Endangered Species Program 2000, Potential hydrology and hydroperiod, Wetlands, 24(1):104–114. Vernal Pools data layer, URL: http://www.mass.gov/anf/ research-and-tech/it-serv-and-support/application-serv/ Brooks, R.T., J. Stone, and P. Lyons, 1998. An inventory of seasonal office-of-geographic-information-massgis/datalayers/pvp.html forest ponds on the Quabbin Reservoir watershed, Massachu- (last date accessed: 21 October 2012). setts, Northeastern Naturalist, 5(3):219–230. National Heritage and Endangered Species Program, 2002. Certified Burne, M.R. 2001. Massachusetts Aerial Photo Survey of Potential Vernal Pools data layer, URL: http://www.mass.gov/mgis/ Vernal Pools, NHESP: Westborough, Massachusetts. cvp.htm (last date accessed: 21 October 2012). Burne, M.R., and C.R. Griffin, 2005. Protecting vernal pools: National Oceanic and Atmospheric Administration (NOAA), A model from Massachusetts, USA, Wetlands Ecology and National Climatic Data Center, 2005. Climate of Massachusetts, Management, 13:367–375. URL: http://gis.ncdc.noaa.gov/map/viewer/#cfg5cdo&theme Calhoun, A.J.K., T.E. Walls, S.S. Stockwell, and M. McCollough, 5normals&layers501&node5gis (last date accessed: 21 2003. Evaluating vernal pools as a basis for conservation October 2012). strategies: A case study, Wetlands, 23(1):70–81. North, M.P., J.F. Franklin, A.B. Carey, E.D. Forsman, and T. Hamer, Calhoun, A., and P. deMaynadier (editors), 2008. Science and T. 1999. Forest stand structure of the northern spotted owl’s Conservation of Vernal Pools in Northeastern North America, foraging habitat, Forest Science, 45(4):520–527. CRC Press, Boca Raton, Florida 363 p. Semlitsch, R.D., and J.R. Bodie, 1998. Are small, isolated wetlands Colburn, E.A., 2004. Vernal pools: Natural History and Conservation, expendable?, Conservation Biology, 12(5):1129–1133. Blacksburg, Virginia: McDonald and Woodward Publishing Skidds, D.E., and F.C. Golet, 2005. Estimating hydroperiod Company. suitability for breeding amphibians in southern Commonwealth of Massachusetts Division of Fisheries and Wildlife, seasonal forest ponds, Wetlands Ecology and Management, 2001. Guidelines for the Certification of Vernal Pool Habitat, 13:349–366. Westborough, Massachusetts, National Heritage and Endangered Sperduto, M.B., and R.G. Congalton, 1996. Predicting rare orchid Species Program. (small whorled pogonia) habitat using GIS, Photogrammetric Congalton, R.G., and K. Green, 2009. Assessing the Accuracy of Engineering & Remote Sensing, 62(11):1269–1279. Remotely Sensed Data: Principles and Practices, Second Story, M., and R.G. Congalton, 1986. Accuracy assessment: A user’s edition, New York: Lewis Publishers, 183 p. perspective, Photogrammetric Engineering & Remote Sensing, Frohn, R.C., M. Reif, C. Lane, and B. Autrey, 2009. Satellite remote 52(3):397–399. sensing of isolated wetlands using object-oriented classification Van Meter, R., L.L. Bailey, and E.H.C. Grant, 2008. Methods for of Landsat-7 data, Wetlands, 29(3):931–941. estimating the amount of vernal pool habitat in the northeastern Grant, E.H.C., 2005. Correlates of vernal pool occurrence in the , Wetlands, 28(3):585–593 Massachusetts, USA landscape, Wetlands, 25(2):480–487. Vogiatzakis, I.N., 2003. GIS-based modeling and ecology: A review Guisan, A., T.C. Edwards, and T. Hastie, 2002. Generalized linear of tools and methods, Geographical Paper No. 170. and generalized additive models in studies of species distribu- Wolfson, L., D. Mokma, G. Schultink, and E. Dersch, 2002. Develop- tions: Setting the scene, Ecological Modeling, 157:89–100. ment and use of a wetlands information system for assessing Guisan, A., and N.E. Zimmermann, 2000. Predictive habitat distribu- wetland functions, Lakes & Reservoirs: Research and Manage- tion models in ecology, Ecological Modeling, 135:147–186. ment, 7:207–216. Hirzel, A.H., V. Helfer, and F. Metral, 2001. Assessing habitat- suitability models with a virtual species, Ecological Modeling, (Received 12 January 2012; accepted 01 August 2012; final version 145:111–121. 07 August 2012)

PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING January 2013 35