Methods to Account for Spatial Autocorrelation in the Analysis of Species Distributional Data: a Review’’

Ecography 32: 374Á378, 2009 doi: 10.1111/j.1600-0587.2008.05562.x # 2009 The Authors. Journal compilation # 2009 Ecography Comment on ‘‘Methods to account for spatial autocorrelation in the analysis of species distributional data: a review’’ Matthew G. Betts, Lisa M. Ganio, Manuela M. P. Huso, Nicholas A. Som, Falk Huettmann, Jeff Bowman and Brendan A. Wintle M. G. Betts ([email protected]), L. M. Ganio, M. M. P. Huso and N. A. Som, Dept of Forest Ecosystems and Society, Oregon State Univ., Corvallis, OR 97331, USA. Á F. Huettmann, Biology and Wildlife Dept, Inst. of Arctic Biology, Univ. of Alaska-Fairbanks, Fairbanks, AK 99775-7000, USA. Á J. Bowman, Ontario Ministry of Natural Resources, Wildlife Research and Development Section, 2140 East Bank Drive, Peterborough, ON K9J 7B8, Canada. Á B. A. Wintle, School of Botany, Environmental Science, The Univ. of Melbourne, Victoria, Australia. In a recent paper, Dormann et al. (2007) (hereafter and interpretation of these models may result in misleading Dormann et al.) conducted a review of approaches to conclusions. account for spatial autocorrelation in species distribution Our critique is on three grounds. First, we show that the models. As the review was the first of its kind in the change Dormann et al. observed in the parameter estimate ecological literature it has the potential to be an important between the autocovariate approach and the true value is and influential source of information guiding research. due to multicollinearity between environment and space. Although many spatial autocovariance approaches may Variation shared among parameters is a common occur- seem redundant in the spatial processes they reflect, rence in ecological models and can rarely be avoided seemingly subtle differences in approach can have major (Graham 2003); however, it can be directly measured using implications for the resulting description of the data and hierarchical partitioning approaches (Chevan and Suther- conclusions drawn. Though Dormann et al.’s review of the land 1991). Second, there are situations in which auto- available approaches was a step in the right direction, we covariate approaches offer the opportunity to incorporate think that their simulation study ignored important con- effects of behavioural and population processes into cepts which leads us to question some of their conclusions. ecological models. This may result in greater understanding One of Dormann et al.’s primary conclusions was that of these processes even though interpretation of the parameter estimates for most spatial modeling techniques estimated coefficients themselves may not be possible. were not strongly biased except in the case of autocovariate Third, we highlight that statistical regression models are models. In the autocovariate model, as implemented by developed for different objectives than outlined by Dorman Dormann et al., the parameter representing the effect of et al. In particular, the goal of predicting future or non- environmental variables on species distributions (the coeffi- sampled observations invites a very different model-build- cient for rain) was consistently underestimated. For this ing strategy than the goal of interpretation of model reason Dormann et al. cautioned the use of autocovariate coefficients (Hastie and Tibshirani 1990). Because of this, approaches. This caution reiterated findings from a similar we argue that comparison of models with different simulation in which (Dormann 2007) argued that auto- objectives should not be limited to an evaluation of only covariate logistic regression models used for binomially bias. We show that the autocovariate approach can be a distributed data (autologistic models) would be biased and useful model if minimizing prediction error is the objective. unreliable. These results appear to be in direct contrast to For brevity, in this paper we focus on autologistic regression for Bernoulli distributed data however, we believe our earlier evaluations of this method (Augustin et al. 1996, arguments are applicable to autocovariate methods used Hoeting et al. 2000, He et al. 2003) and need to be with Poisson and normally distributed data. considered seriously, as autocovariate approaches are now widely used in ecology (Piorecky and Prescott 2006, Wintle and Bardos 2006, McPherson and Jetz 2007, van Teeffelen Multicollinearity of space and environment and Ovaskainen 2007, Miller et al. 2007); for instance the seminal paper on autologistic regression (Augustin et al. Dormann et al. generated artificial distribution data in 1996) has now been cited 222 times (Web of Science which a hypothetical species was positively influenced by accessed 8 September 2008). Simplified implementation rainfall. The authors also simulated spatially correlated 374 errors. The realization of this data generation process was a species distributed as a function of only rainfall and space; this simulation could be thought of as reflecting the realistic scenario that a species is influenced by both the environment and some sort of aggregative process (e.g. dispersal limitation, conspecific attraction; see below). Examination of a map of Dorman et al.’s simulated data clearly reveals a species that is clustered in space (Fig. 1). However, because rainfall itself is positively spatially autocorrelated (Fig. 2), there is overlap in the effects of environmental and aggregative processes on species clustering. Autocovariate models include a covariate (autocovi)to model the influence of ki neighbors at a distance (hij) from a focal site i: Xki w y Figure 2. Degree of spatial autocorrelation, as measured by ij j Moran’s I, in two environmental variables simulated by Dormann j1 et al. (2007). Distance is measured as number of cells. autocovi Xki wij j1 In the case of the Dormann et al. data, we expected some of the variation in species presence to be shared by rainfall The autocovariate, autocovi, is a weighted average of k and the autocovariate. To test this hypothesis, we used the values in the neighbourhood of cell i. The weight given to hierarchical partitioning method (Whittaker 1984, Chevan any neighbouring point j is wij 1/hij where hij is (usually) and Sutherland 1991, Lawler and Edwards 2006) to the Euclidean distance between points i and j. If the species estimate the amount of deviance that is: a) explained is present at point j then y 1, otherwise y 0 (Augustin j j independently by the environmental variable (rainfall; R ) et al. 1996). This covariate is added to a generalized linear I or b) independently by space (the autocovariate; A ), c) model (glm) to account for the variation explained by space. I jointly explained by both variables (R A ), and d) In this case, the observed data are the presence or absence of J J explained by rain in a simple regression model (RT). As the species, Yi which is Bernoulli distributed with a mean r . Then the glm is: expected, over the 10 datasets simulated by Dormann et al., i the proportion of the deviance explained by rain that was logit (ri)In (ri=1ri) shared by the autocovariate was large ([RJAJ]/RT 100 56943% SD). In contrast, only 691% of the explained b b rainfall b autocov 0 1 i 2 i deviance in species distribution could be independently Where b0 is the model intercept, b1 and b2 are parameter attributed to rainfall ((RI/Total explained)100; Fig. 3). estimates for rainfall and the autocovariate respectively, and If two predictor variables in the same model overlap in rainfalli and autocovi are the values of predictor variables at their contribution to the model, coefficients of both the ith site. variables may change radically in comparison to the case Figure 1. One of ten spatial distributions of the hypothetical species generated by Dormann et al. (2007). This distribution shows strong spatial aggregation in the species that is due to both Figure 3. The proportion of variation explained independently by the environment (in this case rainfall) and spatial processes. Black rain and space and explained jointly by both variables (Shared) and gray shaded points are species presences and absences across the ten datasets simulated by Dormann et al. (2007). Error respectively. bars show SE. 375 where each variable occurs on its own in a simple regression from our simulation were higher than expected (mean bˆ model (Wonnacott and Wonnacott 1981). The absolute 0.29290.06 [SE]). This contrasts sharply with the results magnitude of partial regression coefficients increases with of Dormann et al. who found coefficients of the spatially increasing collinearity (Petraitis et al. 1996). The reason for autocorrelated predictor variable, rainfall, to shrink by a the bias observed by Dormann et al. was that information factor of five (0.003/0.0006; p. 618). By changing about species occurrence was shared by the autocovariate only the spatial structure of the explanatory variable alone and rainfall. The spatial aggregation effect simulated by we completely reversed the results reported by Dormann Dormann et al. was large and correlated with rainfall so that et al. In this case, the increased value of the environmental the presence of the autocovariate in the model changed the coefficient was due to the fact that the jointly explained estimated coefficient for the effect of rainfall on species deviance for the autocovariate and djungle was negative occurrence. (1091% SE). To demonstrate this point further we present two tests. It is important to discuss why the eight other methods First, if our argument is true we expect there to be a tested by Dormann et al. to account for spatial autocorrela- negative correlation between the proportion of deviance tion in binary data do not exhibit the same apparent bias in shared by rainfall and space (RJAJ) and the degree to parameter estimates. As noted by Dormann et al., only which the coefficient for rainfall changes as a function of autocovariate regression and spatial eigenvector mapping including or excluding the autocovariate. Using the 10 (SEVM) methods account for spatial autocorrelation via datasets simulated by Dormann et al., we found this pattern additional explanatory variables.

Methods to Account for Spatial Autocorrelation in the Analysis of Species Distributional Data: a Review’’

Fast Computation of the Deviance Information Criterion for Latent Variable Models

A Generalized Linear Model for Binomial Response Data

Comparison of Some Chemometric Tools for Metabonomics Biomarker Identification ⁎ Réjane Rousseau A, , Bernadette Govaerts A, Michel Verleysen A,B, Bruno Boulanger C

Statistics 149 – Spring 2016 – Assignment 4 Solutions Due Monday April 4, 2016 1. for the Poisson Distribution, B(Θ) =

Bayesian Methods: Review of Generalized Linear Models

What I Should Have Described in Class Today

The Deviance Information Criterion: 12 Years On

Heteroscedastic Errors

And the Winner Is…? How to Pick a Better Model Part 2 – Goodness-Of-Fit and Internal Stability

Robustbase: Basic Robust Statistics

Robust Fitting of Parametric Models Based on M-Estimation

Tree Species Richness Predicted Using a Spatial Environmental Model Including Forest Area and Frost Frequency, Eastern USA