Current State of the Art for Statistical Modeling Of
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 16 Current State of the Art for Statistical Modelling of Species Distributions Troy M. Hegel, Samuel A. Cushman, Jeffrey Evans, and Falk Huettmann 16.1 Introduction Over the past decade the number of statistical modelling tools available to ecologists to model species’ distributions has increased at a rapid pace (e.g. Elith et al. 2006; Austin 2007), as have the number of species distribution models (SDM) published in the literature (e.g. Scott et al. 2002). Ten years ago, basic logistic regression (Hosmer and Lemeshow 2000) was the most common analytical tool (Guisan and Zimmermann 2000), whereas ecologists today have at their disposal a much more diverse range of analytical approaches. Much of this is due to the increasing availa- bility of software to implement these methods and the greater computational ability of hardware to run them. It is also due to ecologists discovering and implementing techniques from other scientific disciplines. Ecologists embarking on an analysis may find this range of options daunting and many tools unfamiliar, particularly as many of these approaches are not typically covered in introductory university sta- tistics courses, let alone more advanced ones. This is unfortunate as many of these newer tools may be more useful and appropriate for a particular analysis depending upon its objective, or given the quantity and quality of data available (Guisan et al. 2007; Graham et al. 2008; Wisz et al. 2008). Many of these new tools represent a paradigm shift (Breiman 2001) in how ecologists approach data analysis. In fact, T.M. Hegel (*) Yukon Government, Environment Yukon, 10 Burns Road, Whitehorse, YT, Canada Y1A 4Y9 e-mail: [email protected] S.A. Cushman US Forest Service, Rocky Mountain Research Station, 800 E Beckwith, Missoula, MT 59801, USA J. Evans Senior Landscape Ecologist, The Nature Conservancy, NACR – Science F. Huettmann EWHALE lab- Biology and Wildlife Department, Institute of Arctic Biology, University of Alaska-Fairbanks, 419 IRVING I, Fairbanks, AK 99775-7000, USA e-mail: [email protected] S.A. Cushman and F. Huettmann (eds.), Spatial Complexity, Informatics, 273 and Wildlife Conservation DOI 10.1007/978-4-431-87771-4_16, © Springer 2010 274 T.M. Hegel et al. for a number of these approaches, referring to them as new is a misnomer since they have long been used in other fields and only recently have ecologists become increasingly aware of their usefulness (Hochachka et al. 2007; Olden et al. 2008). The purpose of this chapter is to introduce and provide an overview of the current state of the art of tools for modelling the distribution of species using spatially explicit data, with particular reference to mammals. We include statisti- cal approaches based on data models (e.g. regression) and approaches based on algorithmic models (e.g. machine learning, data mining). Breiman (2001) refers to these approaches as the two cultures. Our goal is not to recommend one approach over another, but rather to provide an introduction to the broad range of tools available, of which many ecologists may not be familiar. Our descriptions of these approaches are admittedly brief due to the necessity of space, and indeed a complete review would require an entire book itself. We hope that our overview provides suf- ficient information for a starting point to search out more detailed information for an analysis. Indeed, we strongly recommend those interested in using any of the tools described here to become familiarized with additional resources, which we have attempted to provide as references. We avoid a detailed discussion of animal and environmental data, as this is covered at length elsewhere in this book (Part III); nor do we delve in depth into the theory of animal–habitat relationships which is also previously discussed (Part II). We begin by outlining some basic concepts and definitions providing an ecological context for SDMs. Following this we briefly describe the types of data used for SDMs and how this affects model interpretation. Subsequently, we outline statistical modelling tools within the data model realm, followed by tools grouped under algorithm models. Finally, we provide an overview of a number of approaches for model evaluation. 16.2 Species Distribution Models in Their Ecological Context 16.2.1 The Ecological Niche Ecological theory suggests that species exhibit a unimodal response to limiting resources in n-dimensional ecological space (Whittaker 1975; Austin 1985; ter Braak 1986). The volume of this ecological space in which an organism can survive and reproduce defines its environmental niche (Hutchinson 1957). Many SDMs are based on this niche concept, in which the niche is as an n-dimensional hyper- volume, where axes represent n resources limiting an organism’s fitness. The niche is defined by the boundaries of these resources, with the volume itself representing the total range of resources providing for the average fitness of an organism to be zero or greater. That is, these boundaries identify the range in which a species can physiologically persist. Hutchinson (1957) proceeded to differentiate between the fundamental niche, described above, and the realized niche in which these resource boundaries are reduced due to inter-specific interactions (e.g. competition, predation). The fundamental niche can be thus viewed as the theoretical limits of resources 16 Current State of the Art for Statistical Modelling 275 allowing an organism to persist, whereas the realized niche is the actual limits of resources in which an organism exists. For a sample of additional resources on the niche concept readers are referred to Chase and Leibold (2003), Kearney (2006), Pulliam (2000), Soberón (2007), and Soberón and Peterson (2005). Quantification of niche space at the species level is a first step toward predicting the distribution, occurrence, or abundance of wildlife species with SDM approaches. Often, the large number of factors which compose the niche can be reduced to a relative few that explain much of the variance in species responses. This technique is heuristically powerful, but it can often obscure relationships between mechanism and response. Importantly, without clear linkages between cause and effect, reliable extrapolation to new conditions (e.g. different study areas, future predictions) is problematic. Therefore, it is preferable to identify limiting factors, which are key variables associated with species tolerances that explain substantial proportions of variance and make sense in terms of well-understood mechanisms. 16.2.2 Scale and Spatial Complexity Biophysical gradients are clines in n-dimensional ecological space. In geographical space these gradients often form complex patterns across a range of scales. The fundamental challenge of using SDMs to predict habitat suitability and occurrence in complex landscapes is linking non-spatial niche relationships with the complex patterns of how environmental gradients overlay heterogeneous landscapes (Cushman et al. 2007). By establishing species optima and tolerances along environmental gradients, researchers can quantify the characteristics of each species’ environ- mental niche. The resulting statistical model can be used to predict the biophysical suitability of each location on a landscape for each species. This mapping of niche suitability onto complex landscapes is the fundamental task required to predict individualistic species responses. High levels of spatial and temporal variability are typically found in ecological systems. This variability in environmental conditions strongly affects the distribution and abundance of species and the structure of biological communities across the landscape. Details of the spatial and temporal structure of ecosystems are impor- tant at a range of scales. There is no single correct scale of analysis for SDM. The fundamental unit of ecological analysis is the organism (Schneider 1994) and fundamental scales are those at which the organism strongly interacts with critical or limiting resources in its environment. Each species will respond to factors across a range of scales in space and time based on its life history strategy and ecological adaptations (Cushman et al. 2007). Ecological responses to environmental gradients must be quantified at scales that match the biological interactions of individual organisms. Analyses at inappropriate scales risk missing or misconstruing relationships between mechanisms and responses (Wiens 1989; Cushman and McGarigal 2003). Accounting for multiple interactions across ranges of spatial and temporal scales is the fundamental challenge to understanding relationships between species distributions and environmental variables in complex landscapes (Levin 1992; 276 T.M. Hegel et al. Turner et al. 2003). Where data allow, it is advantageous to quantitatively measure the relationships among driving factors across a range of scales simultaneously to identify these dominant scales and quantify interaction of factors across scale (e.g. Cushman and McGarigal 2003). Ideally, ecological analysis will therefore not be between hierarchical levels, such as populations, communities, or ecosystems, but instead will focus on relationships among organisms and driving processes across continuous ranges of scale (Levin 1992; Cushman et al. 2007). The literature surrounding SDM consists of a myriad of confusing terminology (Hall et al. 1997; Mitchell 2005; Kearney 2006). There has historically been two classes of SDMs: distribution models (DMs; Soberón and