Global Ecology and Biogeography

Is my species distribu tion model fit for purpose? Matching data and models to applications

Journal: Global Ecology and Biogeography

ManuscriptFor ID: GEB-2014-0185.R2 Peer Review

Manuscript Type: Research Reviews

Date Submitted by the Author: n/a

Complete List of Authors: Guillera-Arroita, Gurutzeta; University of Melbourne, School of Botany Lahoz-Monfort, Jose; University of Melbourne, School of Botany Elith, Jane; University of Melbourne, School of Botany Gordon, Ascelin; RMIT University, School of Global, Urban and Social Studies Kujala, Heini; University of Melbourne, School of Botany Lentini, Pia; University of Melbourne, School of Botany McCarthy, Michael; University of Melbourne, School of Botany Tingley, Reid; University of Melbourne, School of Botany Wintle, Brendan; University of Melbourne, School of Botany

ecological niche model, habitat model, imperfect detection, presence- Keywords: absence, presence-background, presence-only, prevalence, sampling bias

Page 1 of 93 Global Ecology and Biogeography

1 2 3 Is my species distribution model fit for purpose? Matching data and 4 5 models to applications 6 7 1 1 1 2 8 Gurutzeta Guillera-Arroita , José J. Lahoz-Monfort , Jane Elith , Ascelin Gordon , 9 1 1 1 1 10 Heini Kujala , Pia E. Lentini , Michael A. McCarthy , Reid Tingley and Brendan A. 11 Wintle 1 12 13 14 Postal addresses : 15 16 1 School of Botany, The University of Melbourne, Parkville, 3010, Victoria, Australia 17 18 For Peer Review 19 2 School of Global, Urban and Social Studies, RMIT University, Melbourne, 3001 20 21 Victoria, Australia 22 23 Email addresses : Gurutzeta Guillera-Arroita: [email protected]; 24 25 José J. Lahoz-Monfort: [email protected]; Jane Elith: 26 27 [email protected]; Ascelin Gordon: [email protected]; Heini Kujala: 28 [email protected]; Pia E. Lentini: [email protected], Michael A. 29 30 McCarthy: [email protected], Reid Tingley: [email protected]; 31 32 Brendan A. Wintle: [email protected] 33 34 Keywords : ecological niche model, habitat model, imperfect detection, presence- 35 36 absence, presence-background, presence-only, prevalence, sampling bias 37 38 39 Running title : Matching distribution models to applications 40 41 Author for correspondence : Gurutzeta Guillera-Arroita, School of Botany, University 42 43 of Melbourne, Parkville, 3011, Victoria, Australia. tel +61 3 90355479, fax +61 3 44 45 9347 5460, email: [email protected] 46 47 Words in abstract: 201 48 49 50 Words in main body of paper (from Introduction through Biosketch): 9541 51 52 53 Number of references: 82 54 55 56 57 58 59 60 1 Global Ecology and Biogeography Page 2 of 93

1 2 3 1 ABSTRACT 4 5 2 Species distribution models (SDMs) are used to inform a range of ecological, 6 7 3 biogeographical and conservation applications. However, users often underestimate 8 9 10 4 the strong links between data type, model output and suitability for end-use. We 11 12 5 synthesize current knowledge and provide a simple framework that summarizes how 13 14 6 interactions between data type and the sampling process (i.e., imperfect detection and 15 16 7 sampling bias) determine the quantity that is estimated by an SDM. We then draw 17 18 For Peer Review 19 8 upon the published literature and simulations to illustrate and evaluate the information 20 21 9 needs of the most common ecological, biogeographical and conservation applications 22 23 10 of SDM outputs. We find that, while predictions of models fitted to the most 24 25 11 commonly available observation data (presence records) suffice for some 26 27 12 applications, others require estimates of occurrence probabilities, which are 28 29 30 13 unattainable without reliable absence records. Our literature review and simulations 31 32 14 reveal that, while converting continuous SDM outputs into categories of assumed 33 34 15 presence or absence is common practice, it is seldom clearly justified by the 35 36 16 application’s objective and it usually degrades inference. Matching SDMs to the 37 38 39 17 needs of particular applications is critical to avoid poor scientific inference and 40 41 18 management outcomes. Our paper aims to help modellers and users assess whether 42 43 19 their intended SDM outputs are indeed fit for purpose. 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 2 Page 3 of 93 Global Ecology and Biogeography

1 2 3 20 INTRODUCTION 4 5 21 Models play a critical role in conservation decision-making and ecological or 6 7 22 biogeographical inference, but can lead to suboptimal conservation outcomes and 8 9 10 23 misguided theory if the underlying data do not suit the intended application. Building 11 12 24 models with unsuitable data can waste valuable resources and deliver outputs that do 13 14 25 not solve the problem at hand. 15 16 17 26 Species distribution models (SDMs) have become a fundamental tool in ecology, 18 For Peer Review 19 20 27 biogeography, biodiversity conservation, and natural resource management (Guisan & 21 22 28 Thuiller, 2005; Newbold, 2010; Franklin, 2013; Guisan et al. , 2013). SDMs typically 23 24 29 correlate the presence (or presence/absence) of species at multiple locations with 25 26 30 relevant environmental covariates to estimate habitat preferences or predict 27 28 31 distributions; these outputs are commonly used to inform ecological and 29 30 31 32 biogeographical theory as well as conservation decisions (e.g. Akcakaya et al. , 1995; 32 33 33 Pearce & Lindenmayer, 1998; Ferrier et al. , 2002; Bekessy et al. , 2009; Keith et al. , 34 35 34 2014). A review of current use (see Appendix S1 in Supporting Information) reveals 36 37 35 that SDMs are applied broadly, including for managing threatened species (16% of 38 39 40 36 papers), controlling threatening processes (8%), predicting impacts of climate change 41 42 37 (13%), understanding phylogeographic patterns (9%), and managing landscapes (8%) 43 44 38 and biological invasions (7%); and diverse modeling algorithms are used (34 methods 45 46 39 in 100 randomly selected papers). 47 48 49 40 SDMs can also be built with different types of species data, and these fundamentally 50 51 52 41 affect the meaning of the quantity that is estimated. SDMs are, therefore, particularly 53 54 42 prone to problems arising from a mismatch between data type and intended purpose. 55 56 43 For example, some SDM applications require that the probability of a species 57 58 59 60 3 Global Ecology and Biogeography Page 4 of 93

1 2 3 44 occurring at a site (e.g. Hauser & McCarthy, 2009), or the total area occupied (e.g. 4 5 45 Keith et al. , 2008) be known. Other applications only require knowledge about 6 7 46 relative site suitability so that the best (or worst) sites can be identified (e.g. Moilanen 8 9 10 47 et al. , 2005). Although recent studies have mentioned such variation in required 11 12 48 information (e.g. Lawson et al. , 2014), a comprehensive evaluation of how different 13 14 49 sampling processes, data types and modeling approaches influence the utility of 15 16 50 SDMs across applications is lacking. Here, we aim to raise awareness of this often 17 18 For Peer Review 19 51 disregarded, but critical issue. 20 21 22 52 We distinguish three types of data that are typically used in correlative SDMs, which 23 24 53 we label as presence-background , presence-absence and occupancy-detection 25 26 54 (hereafter PB, PA and DET, respectively). With PB data (used in 53% of reviewed 27 28 55 papers), only presence records (a non-exhaustive sample of all true presences) and 29 30 31 56 environmental information are available. This type of data is often referred to as 32 33 57 presence-only data, but we avoid this term as discussed below. Many methods exist 34 35 58 for modeling species distributions based on PB data, including the widely used 36 37 59 Maxent (Phillips & Dudík, 2008; used in 41% of papers), various implementations of 38 39 40 60 regression methods (e.g., Elith et al., 2006), and the broader class of spatial point 41 42 61 process models (PPMs) which have received recent attention (Warton & Shepherd, 43 44 62 2010; Renner & Warton, 2013). For simplicity we will here discuss the data first with 45 46 63 reference to sites (since this is how ecologists often think about data), and later clarify 47 48 64 how this links with data structures for PPMs. 49 50 51 52 65 PB methods estimate habitat preferences by comparing the environmental 53 54 66 characteristics at sites where the species has been recorded (P) to those throughout the 55 56 67 region modeled, which we refer to as the “background” (B). In contrast, PA data 57 58 59 60 4 Page 5 of 93 Global Ecology and Biogeography

1 2 3 68 (47% of papers) provide information on whether a species was detected or not 4 5 69 detected at a set of sampling sites. Logistic regression is commonly used to analyze 6 7 70 PA data, but so are other statistical and machine learning techniques (e.g. see a review 8 9 10 71 in Elith & Franklin, 2013). PA methods estimate the probability of observing a 11 12 72 species at a site by comparing the environmental characteristics at sites where the 13 14 73 species was detected with those at sites where it was not. 15 16 17 74 Occupancy-detection (DET) data (5% of papers) also consist of detection and non- 18 For Peer Review 19 20 75 detection records, but these are collected in such a way that the detection (or 21 22 76 observation) process can be explicitly modeled within the SDM (Lahoz-Monfort et 23 24 77 al. , 2014). For instance, sampling may involve collecting data from repeat visits to 25 26 78 surveyed sites (MacKenzie et al. , 2002; Stauffer et al. , 2002; Tyre et al. , 2003) or 27 28 79 recording times to detection during a single visit (Garrard et al. , 2008; Guillera- 29 30 31 80 Arroita et al. , 2011). This provides information about the probability of detecting the 32 33 81 species given that it is present at a site, and how that probability may vary from site to 34 35 82 site or visit to visit (e.g. Wintle et al. , 2005b), allowing models to account for 36 37 83 imperfect detection in the estimation of species occupancy probability. 38 39 40 84 As noted above, PB data are also sometimes referred to as presence-only data, but 41 42 43 85 here we avoid this term to prevent confusion with true presence-only methods (PO, 44 45 86 used in 11% of papers; e.g. climatic envelopes like BIOCLIM, Busby, 1991; Booth et 46 47 87 al. , 2014), which utilize information about sites where the species was detected 48 49 88 without considering the environmental conditions in the rest of the landscape. We 50 51 52 89 have chosen not to discuss PO methods here because they do not discriminate 53 54 90 between environmental suitability and landscape characteristics (i.e., availability; 55 56 91 Elith et al. , 2011). We note that using PO data limits SDMs by at least the same 57 58 59 60 5 Global Ecology and Biogeography Page 6 of 93

1 2 3 92 degree as working with PB data. Using PO methods does not solve issues related to 4 5 93 working with PB methods; hence the issues we highlight regarding the limitations of 6 7 94 SDMs fitted to PB data also apply to PO approaches. 8 9 10 11 95 For a given species, the probability of a site being recorded as a detection in a dataset 12 13 96 is determined by three probabilities: the probability that the species occupies the site, 14 15 97 the probability that the site is sampled, and the probability that the species is detected 16 17 98 given it is present at the site (Yackulic et al. , 2013). The first of these, the probability 18 For Peer Review 19 20 99 of occupancy, is the most common object of inference in an SDM. The second, the 21 22 100 probability of a site being surveyed, can be affected by sampling bias (e.g. when 23 24 101 surveys are only conducted near populated areas or reserves), whereas the third, 25 26 102 detectability, may vary with environmental variables. As we later explain, our ability 27 28 103 to deal with these three probabilities depends critically on the type of data that are 29 30 31 104 used to build SDMs, and this in turn determines what SDMs can estimate. DET is the 32 33 105 richest type of data in terms of information content, followed by PA, and finally PB. 34 35 106 However, this ordering is reversed when we consider data availability, and quite often 36 37 107 PB is the only type of data that can be obtained (e.g. records from museum or herbaria 38 39 40 108 specimens, Newbold, 2010). Approximately half of recently published papers 41 42 109 involving SDMs rely on PB data (Appendix S1). Hence, it is crucial to understand 43 44 110 whether, given the available data, the output of a proposed SDM will be appropriate 45 46 111 for the intended application. 47 48 49 112 The suitability of an SDM for a given application also relies critically on how SDM 50 51 52 113 raw outputs are used. Our review (Appendix S1) revealed that 54% of recent papers 53 54 114 converted the continuous output produced by SDMs into binary (discretized) 55 56 115 predictions of “presence/absence” or “habitat/non-habitat”. There appears to be a 57 58 59 60 6 Page 7 of 93 Global Ecology and Biogeography

1 2 3 116 belief that this step is required by many ecological, biogeographical or conservation 4 5 117 applications (e.g. Jiménez-Valverde & Lobo, 2007; Li & Guo, 2013; Liu et al. , 2013). 6 7 118 However, attention is rarely paid to whether a binary output is indeed required, or 8 9 10 119 whether this may lead to an unnecessary loss of information and hence be detrimental 11 12 120 in the context of the intended application (but see Merow et al. , 2013; Calabrese et 13 14 121 al. , 2014; Lawson et al. , 2014). Given the prevalence of binary conversion of 15 16 122 predictions and the paucity of critical appraisal of this issue in the literature, we 17 18 For Peer Review 19 123 explore the necessity and potential impacts of discretizing SDM outputs in the context 20 21 124 of specific ecological, biogeographical and conservation applications. 22 23 24 125 In summary, despite being an essential consideration in modeling, confusion remains 25 26 126 about the limitations of different types of data for building SDMs and about the uses 27 28 127 various outputs can justifiably support. This confusion results in insufficient 29 30 31 128 evaluation of whether SDM outputs are appropriate for proposed applications; a 32 33 129 failing we aim to redress here. Our specific aims are twofold. First, we clarify and 34 35 130 synthesize current knowledge regarding the inherent properties and limitations of 36 37 131 correlative SDMs with respect to the type of data that are used to build them. To do 38 39 40 132 this, we provide a framework that clarifies how the interaction between sampling 41 42 133 process and type of species data determines the quantity that is ultimately estimated 43 44 134 from an SDM. Our second aim is to evaluate the information needs of a range of 45 46 135 ecological, biogeographical and conservation applications that typically use SDMs as 47 48 136 inputs, including whether the conversion of continuous SDM outputs into binary 49 50 51 137 predictions is necessary or appropriate. Our paper provides modelers and managers 52 53 138 with a means to assess whether the species data that they have available or are 54 55 139 planning to collect are appropriate for their intended use. Since the focus of this paper 56 57 140 is to identify how types of species data limit uses of SDMs, we assume that other 58 59 60 7 Global Ecology and Biogeography Page 8 of 93

1 2 3 141 relevant aspects of the model building process are well covered. These include having 4 5 142 access to a dataset that is sufficiently large and appropriately collected, that all the 6 7 143 relevant environmental covariates are considered, and that modeling techniques and 8 9 10 144 associated software programs are correctly used. Aside from particular biases induced 11 12 145 by certain data types, we do not consider these issues further. 13 14 15 146 THE INTERPLAY BETWEEN DATA TYPES AND BIASES IN SDMs 16 17 147 There are three fundamental issues to consider with respect to the type of species data 18 For Peer Review 19 20 148 used to build an SDM: i) whether prevalence (the proportion of sites occupied) can be 21 22 149 estimated; ii) the impact of imperfect detection; and iii) the impact of environmental 23 24 150 sampling bias (Fig. 1a). These factors, discussed in more detail below, dictate the 25 26 151 types of application an SDM can reliably support given that they determine whether 27 28 152 the resulting model estimates: i) the probability of species occurrence, ii) a relative 29 30 31 153 likelihood of species occurrence (which is proportional to the actual probability), iii) a 32 33 154 correct ranking of the sites in terms of occurrence probability; or whether it provides 34 35 155 iv) a biased estimation of species occurrence in which not even the correct ranking of 36 37 156 sites in terms of true occurrence probability is achieved. 38 39 40 157 In ecology, biogeography and conservation most users want to know the probability 41 42 43 158 that a species occupies a given environment, which when mapped represents an 44 45 159 estimate of its geographic distribution . This might be substantially different to where 46 47 160 the species is most likely to be observed , which is what the least informative models 48 49 161 estimate (Fig. 1a, column 1). Column 2 (yellow in Fig. 1a) represents models that 50 51 52 162 correctly rank the suitability of locations for the species, although the predicted 53 54 163 suitability is not proportional to the actual probability of occurrence (Fig. 1b). Models 55 56 164 in columns 3 and 4 naturally provide such a ranking as well, but importantly also 57 58 59 60 8 Page 9 of 93 Global Ecology and Biogeography

1 2 3 165 capture the shape of the environmental relationships explaining species occurrence 4 5 166 probability, either directly or with a constant scaling. 6 7 8 167 SDMs that achieve any of the three top information content categories (columns 2-4) 9 10 11 168 can have good discrimination abilities (i.e. distinguish between occupied and 12 13 169 unoccupied sites better than random; Pearce & Ferrier, 2000; Phillips et al. , 2006). 14 15 170 However, only those that estimate probabilities can achieve good calibration (i.e., 16 17 171 agreement between predicted probabilities and observed proportions of sites 18 For Peer Review 19 20 172 occupied, Pearce & Ferrier, 2000). Some of these categories of information content 21 22 173 can be attained with different data types, given particular conditions of sampling bias 23 24 174 and imperfect detection (Fig. 1), as discussed below. What ultimately matters from an 25 26 175 applied perspective is whether an SDM estimates the quantity required for a given 27 28 176 use, be that a probability of occurrence, a relative likelihood, or a correct ranking of 29 30 31 177 sites. Our diagram in Fig. 1a distinguishes between the information requirements of a 32 33 178 particular application and the type of species data (PB, PA or DET) used to construct 34 35 179 the SDM. Such decoupling allows evaluation of the requirements of different 36 37 180 applications without repeated reference to data types or specific issues like bias or 38 39 40 181 imperfect detection: the key is to identify the minimum information required for a 41 42 182 given use. 43 44 45 183 Although for simplicity we present our categorization and discussion only from the 46 47 184 point of view of estimating probabilities of species occurrence at sites, we note that 48 49 185 the issues we cover apply equally well to cases where the focus of estimation is the 50 51 52 186 intensity of points (observations) in the landscape. This is the quantity estimated by 53 54 187 point process models (PPMs; Warton & Shepherd, 2010; Renner & Warton, 2013), 55 56 188 which are not intrinsically based on dividing space into discrete sites; probabilities 57 58 59 60 9 Global Ecology and Biogeography Page 10 of 93

1 2 3 189 can then be derived at any spatial resolution if needed. Regardless of whether 4 5 190 intensities or probabilities are obtained, PPMs for PB data present the same 6 7 191 limitations as other PB methods in terms of their ability to estimate prevalence (and 8 9 10 192 hence absolute intensities/probabilities of occurrence) and are similarly affected by 11 12 193 issues of sampling bias and imperfect detection, as discussed next. 13 14 15 194 Estimation of prevalence 16 17 195 Presence-background methods do not estimate actual probabilities, but relative 18 For Peer Review 19 20 196 likelihoods of species occurrence (or observation). From a set of presence records 21 22 197 alone (PB data) it is not possible to distinguish whether a species is rare and well 23 24 198 surveyed, or common but under-surveyed. Distinguishing between these two 25 26 199 possibilities requires PA or DET data. Despite being openly acknowledged by 27 28 200 developers of PB methods (e.g. Ferrier & Watson, 1997; Elith et al. , 2006), this 29 30 31 201 limitation is not always fully appreciated. Confusion remains among users, as 32 33 202 indicated for instance by frequent assumptions that the output of Maxent is a 34 35 203 probability of occurrence (a fact reported e.g. by Yackulic et al., 2013 and Guillera- 36 37 204 Arroita et al. , in press). 38 39 40 205 Some authors note that, under certain strong conditions, prevalence can be estimated 41 42 43 206 (identified) from PB data (Lancaster & Imbens, 1996; Lele & Keim, 2006; Royle et 44 45 207 al. , 2012). However, these conditions involve very specific restrictions about how 46 47 208 species occurrence varies with explanatory variables. Identifiability of prevalence 48 49 209 largely relies on the true relationship conforming exactly to an assumed parametric 50 51 52 210 form, which is highly unlikely to occur in reality (Hastie & Fithian, 2013; Phillips & 53 54 211 Elith, 2013). Mild departures from the assumed functional form can cause profound 55 56 212 errors (Figure S2.1 in Appendix S2) and, even if the assumption is perfectly met, 57 58 59 60 10 Page 11 of 93 Global Ecology and Biogeography

1 2 3 213 estimates remain very imprecise (Merow & Silander, 2014). Hence, for practical 4 5 214 purposes, it can be argued that prevalence is not obtainable from PB data (Hastie & 6 7 215 Fithian, 2013; Lele et al. , 2013; Phillips & Elith, 2013). 8 9 10 11 216 Imperfect detection 12 13 217 It is widely recognized that surveys often fail to detect species present at a site 14 15 218 (Yoccoz et al. , 2001; Kéry, 2002), even for sessile species such as plants (Garrard et 16 17 219 al. , 2008; Chen et al. , 2013). If detection is imperfect and this is not accounted for, 18 For Peer Review 19 20 220 occupancy and detection processes are confounded, and SDMs might estimate where 21 22 221 the species is more likely to be observed rather than where it occurs (Gu & Swihart, 23 24 222 2004; Kéry, 2011; Lahoz-Monfort et al. , 2014). Estimated distributions can be 25 26 223 substantially different depending on whether detectability is accounted for or not (e.g. 27 28 224 Kéry et al. , 2010; Kéry et al. , 2013). 29 30 31 32 225 The impact of ignoring imperfect detection is greatest when detectability is a function 33 34 226 of environmental variables, because this distorts our understanding of even the 35 36 227 general shape of the environmental relationships explaining the distribution of the 37 38 228 species (see Figs. 1-2 in Lahoz-Monfort et al. , 2014). Despite some past confusion in 39 40 229 the literature, it is now clearly established that imperfect detection will affect both PA 41 42 43 230 and PB methods in this situation (Elith et al. , 2011; Dorazio, 2012; Yackulic et al. , 44 45 231 2013; Lahoz-Monfort et al. , 2014) – i.e. using only presence records does not 46 47 232 circumvent the problem of imperfect detection. In the case of PA data, imperfect 48 49 233 detection can lead to false absence records, while the effect on PB methods stems 50 51 52 234 from the fact that presence records do not provide a random sample of locations in 53 54 235 which the species is actually present. Although not always recognized, imperfect 55 56 236 detection is also an issue when detection and occurrence depend on different and 57 58 59 60 11 Global Ecology and Biogeography Page 12 of 93

1 2 3 237 uncorrelated covariates. Bias can still be induced in the estimation of the species 4 5 238 distribution as the modeling process can mistakenly identify detection covariates as 6 7 239 relevant predictors of occurrence probability (Lahoz-Monfort et al. , 2014). 8 9 10 11 240 Acknowledging that detection can be imperfect implies accepting that PA and PB 12 13 241 methods might provide a biased estimation of species distributions (column 1 in Fig 14 15 242 1a). It is only under particular conditions that a more meaningful output can be 16 17 243 obtained. If detectability is constant across sites (and there is no sampling bias), PB 18 For Peer Review 19 20 244 methods can reliably estimate the relative likelihood of species occurrence despite 21 22 245 imperfect detection (column 3). Constant detectability might occur if the same 23 24 246 observer carried out all surveys under similar conditions and in comparable habitats. 25 26 247 The relative likelihood of species occurrence is the maximum amount of information 27 28 248 that a PB method can yield, even when detection is perfect. Likewise, when PA data 29 30 31 249 are available and detection is imperfect but constant, SDMs can estimate only a 32 33 250 relative likelihood of species occurrence (column 3). However, with PA data there is 34 35 251 also the potential to obtain unbiased estimation of occurrence probabilities when 36 37 252 detection is perfect (column 4). When detectability depends on environmental 38 39 40 253 covariates in the same way that species occurrence does (i.e. occupancy and 41 42 254 detectability are positively correlated), the general shape of the estimated 43 44 255 environmental relationship may be affected, but the correct ranking of the suitability 45 46 256 of sites is maintained (column 2). 47 48 49 257 When appropriate data are available (i.e. DET data), models that estimate species 50 51 52 258 occupancy while accounting for detectability separate detection from occurrence 53 54 259 (MacKenzie et al. , 2002; Stauffer et al. , 2002; Tyre et al. , 2003; Guillera-Arroita et 55 56 260 al. , 2011; Guillera-Arroita et al. , 2014b), enabling unbiased estimation of occupancy 57 58 59 60 12 Page 13 of 93 Global Ecology and Biogeography

1 2 3 261 probabilities even if detectability varies (provided the detection process is well 4 5 262 characterized in the model). The lower the detectability of a species, the more data are 6 7 263 required to precisely estimate its distribution (MacKenzie & Royle, 2005; Guillera- 8 9 10 264 Arroita & Lahoz-Monfort, 2012). 11 12 13 265 Sampling bias 14 15 266 Presence-background methods work under the assumption that sampling is unbiased 16 17 267 (Phillips et al. , 2009; Elith et al. , 2011). However, PB data seldom arise from random 18 For Peer Review 19 20 268 sampling, and often are collected in an opportunistic and spatially biased manner. 21 22 269 Common examples include datasets derived from herbaria and museum specimens 23 24 270 (Newbold, 2010). In such datasets, sampling is often biased towards accessible 25 26 271 locations near roads or towns. The problem for modeling species distributions is not 27 28 272 the spatial bias in itself, but a bias in how the available environmental conditions are 29 30 31 273 sampled. For example, areas close to cities may tend to be at lower elevations and 32 33 274 have higher soil fertility. In PB SDMs, sampling bias causes biased estimation of 34 35 275 environmental relationships, with suitability over-estimated for environments that 36 37 276 have been sampled more intensively, and under-estimated for those surveyed less 38 39 40 277 frequently. Some methods have been proposed to mitigate the problem of sampling 41 42 278 bias in PB SDMs. However, none of these can completely resolve the problem as they 43 44 279 rely on indirectly inferring the sampling process (e.g. by considering records of other 45 46 280 species as a proxy; Phillips et al. , 2009), or on the ability to model the sampling 47 48 281 process separately, which requires having good predictors of sampling effort and 49 50 51 282 independence between those and the predictors of species occurrence (Chakraborty et 52 53 283 al. , 2011; Fithian & Hastie, 2013; Warton et al. , 2013; Fithian et al. , in press). 54 55 56 57 58 59 60 13 Global Ecology and Biogeography Page 14 of 93

1 2 3 284 For PB SDMs, sampling bias is analogous to imperfect detection when detectability 4 5 285 depends on environmental covariates. In both cases the species is less likely to be 6 7 286 detected in certain environments, either because it is truly more difficult to detect 8 9 10 287 when those sites are surveyed (“true” detectability issue) or because those sites are 11 12 288 less likely to be visited (environmental sampling bias). It is only when there is no 13 14 289 sampling bias that PB methods are able to estimate the relative likelihood of species 15 16 290 occupancy (column 3 in Fig. 1a); when there is environmental sampling bias but it is 17 18 For Peer Review 19 291 positively correlated with occupancy, site ranking is nevertheless maintained (column 20 21 292 2). Sampling bias is not only an issue when sampling effort is correlated with 22 23 293 covariates determining species occurrence. Sampling effort that varies with factors 24 25 294 uncorrelated with species occurrence can still bias the estimation of the relative 26 27 295 likelihood of occurrence, as those factors can be incorrectly identified as relevant 28 29 30 296 predictors in the model. 31 32 33 297 In contrast to PB models, the effect of sampling bias on PA or DET SDMs is very 34 35 298 different and has less critical consequences, as it does not introduce bias in the 36 37 299 estimation (Phillips et al. , 2009). Instead, sampling bias implies reduced estimation 38 39 40 300 precision for sites in the environmental space that are less intensively sampled. 41 42 43 301 SDM OUTPUTS IN ECOLOGICAL, BIOGEOGRAPHICAL AND 44 45 302 CONSERVATION APPLICATIONS 46 47 303 SDM outputs need to match the objectives of the applications that they aim to inform. 48 49 304 In Appendix S3 we review extensively the information required of SDMs for 50 51 52 305 ecological and conservation applications, organized into four tables that represent the 53 54 306 following domains: (i) management of invasive species, (ii) management of 55 56 307 threatened species, (iii) spatial planning, and (iv) ecological and biogeographical 57 58 59 60 14 Page 15 of 93 Global Ecology and Biogeography

1 2 3 308 inference. For each application, we specify how SDM outputs are used, describe the 4 5 309 potential consequences of using a biased estimation of occupancy, and discuss 6 7 310 whether a relative likelihood or correct ranking is a sufficient input. Readers can 8 9 10 311 search for their application in the first column of the tables, find the minimum level of 11 12 312 information content required, and use Fig. 1 to determine the circumstances in which 13 14 313 each data type will produce a suitable SDM output for that application. We also assess 15 16 314 whether particular applications actually require a binary input (generated by applying 17 18 For Peer Review 19 315 a threshold to the SDM output), or whether working with the continuous SDM output 20 21 316 is more appropriate. 22 23 24 317 In the following subsections, we explore five simulation-based case studies that 25 26 318 characterize common uses of SDMs in ecology, biogeography and conservation 27 28 319 (summarized in Table 1), and use these to illustrate key points regarding SDM data 29 30 31 320 needs. For each case study, we assessed the consequences of using SDMs that 32 33 321 estimate probabilities or relative likelihoods, considering both direct use of 34 35 322 continuous SDM outputs and use after conversion to binary maps. For the latter, we 36 37 323 tested a set of commonly used thresholds (Appendix S1; Liu et al. , 2005): (i) the 38 39 th 40 324 minimum estimate corresponding to a training presence (“MTP”); (ii) the 10 - 41 42 325 percentile estimate for training presences (“10TP”); (iii) the threshold that resulted in 43 44 326 equal sensitivity and specificity (“SS=SP”); and (iv) the threshold that maximized the 45 46 327 sum of sensitivity and specificity (“max(SS+SP)”). For PB data, specificity cannot be 47 48 328 defined in the usual way given that absence records are not available; we followed the 49 50 51 329 practice in Maxent that uses the background data instead, and is therefore related to 52 53 330 predicted area. 54 55 56 331 Case study 1: Wildlife monitoring 57 58 59 60 15 Global Ecology and Biogeography Page 16 of 93

1 2 3 332 The area (or number of sites) occupied by a species (sometimes abbreviated AOO, 4 5 333 “area of occupancy”) is the state variable of interest in many large-scale monitoring 6 7 334 programs, which aim to track variation in AOO over time (MacKenzie et al. , 2006, 8 9 10 335 pp.41-44). Both AOO and changes in AOO are also important elements of IUCN Red 11 12 336 List assessments (criteria A, B2 and D2; IUCN, 2012). AOO can be estimated by 13 14 337 summing the site occupancy probabilities obtained with an SDM over the landscape 15 16 338 of interest. It is clear that this cannot be done based on relative likelihoods (since by 17 18 For Peer Review 19 339 definition AOO is the prevalence multiplied by the total area). Here we use a 20 21 340 simulation study to show the extent to which it is possible to track temporal changes 22 23 341 in AOO by fitting models that provide lower levels of information than occupancy 24 25 342 probabilities. We also explore whether converting the output of SDMs into binary 26 27 343 maps assists in estimating AOO. 28 29 30 31 344 We simulated a temporal decline in the AOO of a species (representing the ‘true’ 32 33 345 change), and the sampling of large PA and PB datasets for 3 time steps, which, when 34 35 346 analysed led to estimated probabilities and relative likelihoods of occupancy, 36 37 347 respectively (details in Appendix S4). To estimate the change in AOO, SDM 38 39 40 348 estimates were summed over the landscape, either directly as a continuous output or 41 42 349 after applying the four thresholds listed above to convert them to binary maps. 43 44 45 350 The simulations demonstrate that relative likelihoods produced by a PB-based SDM 46 47 351 are not comparable across time: prevalence is proportional to the declining AOO but 48 49 352 cannot be estimated, and thus the resulting estimates are clearly unable to track 50 51 52 353 declines in AOO (Fig. 2). PB data cannot determine whether an apparent decrease in 53 54 354 the number of detections for a species through time reflects an actual reduction in 55 56 355 AOO or declining survey effort. Binary conversion of the SDM output prior to AOO 57 58 59 60 16 Page 17 of 93 Global Ecology and Biogeography

1 2 3 356 calculation does not solve the problem of working with relative likelihoods of 4 5 357 occurrence, because a binary categorization does not fix the fact that prevalence 6 7 358 cannot be estimated without absence data. Furthermore, binary conversion is 8 9 10 359 detrimental compared to using the actual probabilities of occurrence when available. 11 12 360 This is because a binary categorization represents a coarse interpretation of species 13 14 361 occurrence probabilities and reduces the information content compared to using the 15 16 362 full range of values provided by the SDM. In our simulations, detection was assumed 17 18 For Peer Review 19 363 to be perfect, hence the PA-based SDM produced reliable estimates of probabilities of 20 21 364 occurrence. However, imperfect detection can cause PA methods to miss trends or 22 23 365 detect spurious ones. An exception to that rule occurs if detectability is constant 24 25 366 across space and time (a relatively strong assumption). Then, PA methods estimate 26 27 367 relative likelihoods of occupancy that track changes in AOO. 28 29 30 31 368 Case study 2: Invasive species priority lists 32 33 369 Biosecurity resources are limited, so it is common for government agencies to 34 35 370 prioritize exotic species for management intervention. The potential distribution of an 36 37 371 exotic species is a key indicator of its ability to spread, and is therefore frequently 38 39 40 372 considered in this prioritization process. In Australia, for example, Weeds of National 41 42 373 Significance are in part determined by examining the potential distribution of a range 43 44 374 of candidate species under current and future climates (Lizzio et al. , 2009). This 45 46 375 listing process influences the allocation of potentially millions of dollars worth of 47 48 376 biosecurity resources. 49 50 51 52 377 Here we show that estimates of the relative likelihood of occupancy are not suitable 53 54 378 for prioritizing species according to their potential area of occupancy (AOO). For this, 55 56 379 we simulated a set of 25 species, sampled their distributions and built SDMs based on 57 58 59 60 17 Global Ecology and Biogeography Page 18 of 93

1 2 3 380 PA and PB datasets (details in Appendix S4). For each species, we computed the sum 4 5 381 of the SDM estimates across the whole landscape, using the continuous output as well 6 7 382 as the binary outputs obtained after applying the thresholds. 8 9 10 11 383 In statistical terms, the sum of estimated occupancy probabilities across the region is 12 13 384 the expected value of the AOO, and hence it is a good estimator for AOO, leading to 14 15 385 the correct prioritization of species (PA method, Fig. 3). However, if the output of the 16 17 386 SDM is a relative likelihood of species occupancy, the AOO cannot be estimated, as 18 For Peer Review 19 20 387 we saw in case study 1. Crucially, the quantities obtained are not comparable across 21 22 388 species, and hence species cannot be prioritized based on these data (PB method, Fig. 23 24 389 3). Applying a binary conversion to the SDM output does not solve the problem for 25 26 390 the same reasons as described above. In this case study, detection was assumed to be 27 28 391 perfect, hence the PA-based SDMs produced estimates of actual probabilities. 29 30 31 392 Imperfect detection can also lead to incorrect prioritization when PA data are used, 32 33 393 because, even if detectability is constant across space, it is likely to be species- 34 35 394 specific. 36 37 38 395 Case study 3: Optimal surveillance of invasive species 39 40 396 SDMs can be used to inform invasive species surveillance. For instance, Hauser & 41 42 43 397 McCarthy (2009) proposed a spatial detection and treatment model, and identified the 44 45 398 level of surveillance that minimizes total expected costs while taking into account 46 47 399 ease of detection and control, as well as the probability of species occurrence at the 48 49 400 site. This approach was subsequently used to design surveys for invasive orange 50 51 52 401 hawkweed Pilosella aurantiaca in southeastern Australia (Herbert et al. , 2013). The 53 54 402 method considers the following scenario: a site is surveyed and, if the species is 55 56 403 detected, the site is managed for eradication. If not detected, the species is assumed 57 58 59 60 18 Page 19 of 93 Global Ecology and Biogeography

1 2 3 404 absent and the site is not managed. However, the species could have been present and 4 5 405 not observed, in which case the lack of early management may lead to a wider 6 7 406 infestation that ultimately incurs higher management costs. There is thus a trade-off 8 9 10 407 between the amount of resources spent on surveying and early management on 11 12 408 infested sites, and the resources ultimately spent in delayed management at sites 13 14 409 where the species was present but undetected. 15 16 17 410 For this case study, we sampled the distribution of a simulated invasive species, built 18 For Peer Review 19 20 411 SDMs based on PA and PB datasets, and then applied the approach by Hauser & 21 22 412 McCarthy (2009) to determine the optimal level of surveillance without imposing 23 24 413 budget restrictions, using the SDM outputs as estimates of species occurrence 25 26 414 probabilities (details in Appendix S4). Only when the SDM values used for the 27 28 415 optimization represent actual probabilities of occurrence (PA method) can the 29 30 31 416 potential savings from surveying be fully realized (Figure 4; but see Guillera-Arroita 32 33 417 et al. (2014a) for cases where relative probabilities may suffice). In our simulated 34 35 418 example, applying the truly optimal level of surveillance reduced costs by 74%, 36 37 419 compared to a situation without surveillance and early management. In contrast, if the 38 39 40 420 level of surveillance was determined by wrongly interpreting the output of a PB 41 42 421 method as probabilities, total costs increased by 124%. 43 44 45 422 Case study 4: Estimation of species richness 46 47 423 SDMs can be combined to model biodiversity at the community level following a 48 49 424 “predict first, assemble later” strategy (Ferrier & Guisan, 2006), whereby separate 50 51 52 425 SDMs for each species are stacked using some form of aggregation. One example of 53 54 426 this approach is where SDMs are summed site-by-site to derive predictions of species 55 56 427 richness (i.e. number of species present at each site), which can then be used to 57 58 59 60 19 Global Ecology and Biogeography Page 20 of 93

1 2 3 428 explore further ecological or biogeographical questions, or to identify biodiversity 4 5 429 hotspots for conservation (e.g. Parviainen et al. , 2009). 6 7 8 430 The sum of species occupancy probabilities at a site is equal to the expected number 9 10 11 431 of species present, and hence is a good estimator of species richness (Calabrese et al. , 12 13 432 2014). One would expect richness not to be reliably estimated if the individual SDMs 14 15 433 only estimate relative likelihoods of species occurrence. Yet, a number of studies 16 17 434 have applied this methodology using PB SDMs (e.g. Pineda & Lobo, 2009; Schmidt- 18 For Peer Review 19 20 435 Lebuhn et al. , 2012). There are also claims in the literature that, when calculating 21 22 436 species richness, it is best to transform continuous SDM predictions into binary 23 24 437 predictions prior to summing them (e.g. Newbold et al. , 2009; Pineda & Lobo, 2009) 25 26 438 or even that such transformation is simply “inevitable” (Trotta-Moreu & Lobo, 2010). 27 28 439 We test whether these seemingly counterintuitive assertions are justified in the 29 30 31 440 simulations described below. 32 33 34 441 We simulated the distribution of a set of 100 species, sampled them, fitted SDMs 35 36 442 using PA and PB methods, and then aggregated the resulting SDMs to estimate 37 38 443 species richness (details in Appendix S4). Here we report the results of a scenario 39 40 444 where half of the species depend on one covariate and tend to have low prevalence, 41 42 43 445 while the second half depend on a different covariate and tend to have high 44 45 446 prevalence (results for other scenarios are presented in Table S4.2 in Appendix S4). 46 47 48 447 Only when the stacked SDMs predict true probabilities is the estimation of species 49 50 448 richness unbiased (Fig. 5). Attempts to estimate richness based on SDM outputs that 51 52 53 449 only represent relative likelihoods of species occupancy are arbitrarily biased (in our 54 55 450 example, richness is overestimated). Binary conversion of SDM outputs does not 56 57 451 solve the problems associated with relative likelihoods and it is clearly detrimental 58 59 60 20 Page 21 of 93 Global Ecology and Biogeography

1 2 3 452 when applied to SDMs that estimate probabilities (Fig. 5), as recently shown 4 5 453 mathematically by Calabrese et al. (2014). Our simulated examples suggest that 6 7 454 continuous (i.e. uncategorized) relative likelihoods may broadly capture relative 8 9 10 455 patterns of species richness in some scenarios, even if they are unable to estimate 11 12 456 absolute numbers of species (i.e. actual and estimated richness correlated). This 13 14 457 finding is compatible with results presented in the literature (e.g. Aranda & Lobo, 15 16 458 2011), although further research is needed before establishing how widely this pattern 17 18 For Peer Review 19 459 applies in real systems. 20 21 22 460 Case study 5: Spatial prioritization 23 24 461 SDMs are frequently used to prioritize areas for conservation actions such as reserve 25 26 462 establishment or restoration. To date, multiple methods and software tools have been 27 28 463 applied to spatial conservation planning, including sophisticated approaches based on 29 30 31 464 algorithms that can solve highly complex spatial problems (Moilanen et al. , 2009). 32 33 465 The two most frequently used modes of planning are i) the minimum set approach 34 35 466 (Cocks & Baird, 1989), in which areas are selected in order to meet species-specific 36 37 467 (or feature-specific) targets while minimizing the cost of the solution, and ii) the 38 39 40 468 maximum coverage/utility approach (Hof & Raphael, 1993; Camm et al. , 1996) in 41 42 469 which biodiversity benefits are maximized within given budgetary or area constraints. 43 44 470 Here we explore the effect of SDM output scaling and binary conversion on reserve 45 46 471 selection when using a target-based minimum set approach. We use a widely adopted 47 48 472 planning tool, Marxan (Ball et al. 2009), as an example but the conclusions derived 49 50 51 473 are equally valid for other tools within this group. In Appendix S4 we present the 52 53 474 analogous analysis using Zonation (Moilanen et al. , 2012), as an example of 54 55 475 maximum coverage/utility approaches. 56 57 58 59 60 21 Global Ecology and Biogeography Page 22 of 93

1 2 3 476 We used SDMs obtained based on real PA data for seven species from the Lower 4 5 477 Hunter region in Australia (Wintle et al. , 2005a), rescaled them to mimic PB methods 6 7 478 that produce estimates of relative likelihoods (scaling between 0.3-0.9), and applied 8 9 10 479 the thresholds above to obtain binary SDMs (details in Appendix S4). For the Marxan 11 12 480 analysis, the amount of each conservation feature (species) contained in each cell was 13 14 481 set to the prediction of the corresponding SDM. Two types of target were assessed: 15 16 482 ‘proportional’ (conserve a set proportion of the area that each species occupies) and 17 18 For Peer Review 19 483 ‘absolute’ (conserve a set area occupied by each species). The absolute areas were 20 21 484 chosen by multiplying the proportional targets by the total area occupied by each 22 23 485 species according to the original PA-based SDMs. The planning solutions produced 24 25 486 after applying scaling and binary conversion to the SDMs were compared with results 26 27 487 obtained using the original SDMs, in terms of the amount of ‘true’ species 28 29 30 488 distribution protected in each solution (the ‘true’ distributions were simulated based 31 32 489 on the original SDM probabilities). 33 34 35 490 We found that planning solutions based on relative likelihoods are comparable to 36 37 491 those based on true occurrence probabilities provided proportional targets are used 38 39 40 492 (Fig. 6b). However, the minimum set approach over- or under-shoots true absolute 41 42 493 targets when based on SDMs that estimate relative likelihoods (Fig. 6a). When SDMs 43 44 494 underestimate prevalence (scaling<1), targets are met but solutions are more costly 45 46 495 (e.g. the cost of achieving the targets was double in Fig. 6a). Conversely, if SDMs 47 48 496 overestimate prevalence, the algorithm interprets that it is conserving more than it 49 50 51 497 actually does, and conservation targets are missed. Binary conversion of SDM outputs 52 53 498 leads to a reserve system that fails to meet true targets and/or is more costly, even if 54 55 499 relative targets are used (Fig. 6c-d): the information loss ultimately results in 56 57 500 expending excessive resources on some species while under-representing others. 58 59 60 22 Page 23 of 93 Global Ecology and Biogeography

1 2 3 501 Targets and achieved levels of conservation disagree more when absolute targets are 4 5 502 used. 6 7 8 503 Although we have shown that the use of relative likelihoods does not affect the 9 10 11 504 planning solution when targets (or algorithms) are based on proportions of species 12 13 505 distributions, it is important to realize that the true area occupied by species in 14 15 506 reserves will remain unknown. This limitation may have implications in ecological, 16 17 507 biogeographical and conservation applications. For example, one will not be able to 18 For Peer Review 19 20 508 judge whether the occupied area within a reserve is large enough to represent a viable 21 22 509 population of the species. 23 24 25 510 DISCUSSION 26 27 511 We have shown how survey data type, sampling bias and imperfect detection interact 28 29 512 to determine the quantity that is ultimately estimated by an SDM (Fig. 1), and the 30 31 32 513 implications of reducing SDM outputs to a binary categorization. Each of these issues 33 34 514 have been discussed in the literature (e.g. Phillips et al. , 2009; Phillips & Elith, 2013; 35 36 515 Calabrese et al. , 2014; Lahoz-Monfort et al. , 2014), but have generally been 37 38 516 addressed as separate problems, with little or no evaluation of their implications for 39 40 517 particular applications. We have integrated these issues into a single framework to aid 41 42 43 518 interpretation and explicit consideration of the implications for the most common 44 45 519 applications of SDM. Our simulations demonstrate that these issues really matter and 46 47 520 that misguided or inefficient management and inference may result if they are 48 49 521 ignored. 50 51 52 53 522 SDMs, data and decision-theory 54 55 523 It has recently been advocated that the modeling of species distributions should be 56 57 524 carried out within a structured and transparent decision-making framework (Guisan et 58 59 60 23 Global Ecology and Biogeography Page 24 of 93

1 2 3 525 al. , 2013). Formulating a clear objective is the starting point for such a process. To 4 5 526 ensure that an SDM ultimately fits its purpose one must first ask “what is my SDM to 6 7 527 be used for?” Here we argue that a critical consideration is whether the type of 8 9 10 528 information demanded by the application in question is available or could be 11 12 529 obtained. In practice, this requires evaluation of whether the SDM output needs to 13 14 530 provide estimates of actual occupancy probabilities, or whether relative likelihoods, 15 16 531 or the ranking of sites will suffice. Our assessment shows that, while learning about 17 18 For Peer Review 19 532 the relative likelihood of species occupancy in different sites is sufficient for some 20 21 533 applications (e.g. spatial prioritization when targets are expressed as proportion of 22 23 534 distributions, Fig. 6; or cost-sharing agreements about invasive species management, 24 25 535 Appendix S3), for many other applications information about the actual probability 26 27 536 that a site contains a species is crucial (e.g. species prioritization according to the area 28 29 30 537 occupied, Fig. 6; or estimation of species richness, Fig. 5). One should also evaluate 31 32 538 whether the data (existing or to be collected) can be assumed to be free of 33 34 539 considerable biases or, otherwise, whether the type of data allows unbiased estimation 35 36 540 when used in conjunction with appropriate modeling methods. Thinking about the 37 38 39 541 type of output that an SDM is expected to provide for a given application is also 40 41 542 important from the point of view of model evaluation, as it directly relates to the 42 43 543 metrics that should be used to assess its performance (Lawson et al. , 2014). 44 45 46 544 Where an application appears to require only knowledge about relative likelihoods, it 47 48 545 is essential to consider carefully whether SDM predictions are indeed comparable 49 50 51 546 within that specific context. For instance, assuming detectability and sampling effort 52 53 547 do not vary spatially, predictions from a PB model represent relative likelihoods and 54 55 548 these are comparable across space. However, comparisons across time or between 56 57 549 species are not valid because prevalence will differ across SDMs and cannot be 58 59 60 24 Page 25 of 93 Global Ecology and Biogeography

1 2 3 550 estimated from PB data: the predicted SDM values in a given location are relative to 4 5 551 other locations modeled, but are not comparable across models fitted for different 6 7 552 species or over different time periods. Thus, even if relative likelihood predictions for 8 9 10 553 two species happen to have the same value at a given location, it does not follow that 11 12 554 both species have the same probability of occurrence on those locations. The same 13 14 555 reasoning applies to temporal comparisons. In general, this rules out the use of SDMs 15 16 556 fitted to PB data in monitoring temporal trends in species' area of occupancy (AOO; 17 18 For Peer Review 19 557 Fig.2), or in the prioritization of a set of species according to their AOOs (Fig.3). 20 21 558 Trends in AOO can be explored with PB data when projecting an SDM to changing 22 23 559 conditions, but not when separate models are fitted to capture changing relationships 24 25 560 with the predictors. 26 27 28 561 Binary conversion of SDM predictions 29 30 31 562 Our case studies show how the conversion of SDM predictions into discrete 32 33 563 categories diminishes the value of SDMs for a range of applications such as spatial 34 35 564 prioritization (Fig. 6), species richness estimation (Fig. 5), or applications that require 36 37 565 estimating species AOO (Figs. 2 and 3). Our results support other recent studies that 38 39 40 566 argue against the discretization of SDM outputs for specific applications (e.g. 41 42 567 Calabrese et al. , 2014; Lawson et al. , 2014). In short, the continuous outputs of SDMs 43 44 568 provide richer information than discrete representations and therefore the 45 46 569 discretization of SDM outputs is detrimental in most applications. 47 48 49 570 The commonly-held belief that applications require binary SDM outputs may stem 50 51 52 571 partly from unrealistic expectations about what an SDM can, or should provide. In an 53 54 572 ideal world, an SDM would provide perfect discrimination between sites at which 55 56 573 species are present and absent, leading to a binary map of true occurrence. However, 57 58 59 60 25 Global Ecology and Biogeography Page 26 of 93

1 2 3 574 data limitations (a sample of observations) and the inherently stochastic nature of site 4 5 575 occupancy (i.e., the fact that reality tends to be too complex to be categorically 6 7 576 described by measurable covariates in a statistical model) mean that SDMs can only 8 9 10 577 be expected to provide probabilistic predictions of species occurrence. Transforming 11 12 578 such probabilistic outputs into binary maps using a threshold does not provide an 13 14 579 estimate of site occupancy that is closer to the true presences and absences; on the 15 16 580 contrary, discretization degrades the information available in SDM predictions (Fig. 17 18 For Peer Review 19 581 S2.2 in Appendix S2). Some studies have used binary conversion in an attempt to get 20 21 582 around the problem of not being able to estimate prevalence in PB SDMs (e.g. Pineda 22 23 583 & Lobo, 2009; Liu et al. , 2013). Our simulations confirm that simplifying PB-based 24 25 584 SDM outputs into binary categorizations does not solve this problem (Figs. 2,3,5,6). 26 27 28 585 Our review of SDM applications indicates that, despite being a relatively widespread 29 30 31 586 practice, SDM outputs should rarely be converted into binary maps (Appendix S3). In 32 33 587 this regard, we again emphasize the need to use SDMs within a structured decision- 34 35 588 making framework. Specifying exactly how SDM outputs are to be used for particular 36 37 589 applications helps clarifying whether binary conversion is actually necessary or 38 39 40 590 beneficial, and which threshold is most appropriate, if any. For instance, a 41 42 591 management body may decide to identify “critical habitat” for a species to grant 43 44 592 relevant sites special protection. In this case, the objective involves classifying the 45 46 593 habitat into two categories (protected or not), hence binary conversion is fully 47 48 594 justified by the objective. Yet, not all conversions would be equally acceptable: the 49 50 51 595 objective should be articulated to provide the rules to select the appropriate threshold. 52 53 596 For example, if the aim were to protect 90% of a species’ distribution, the threshold 54 55 597 that achieved a sensitivity of 0.9 should be used. However, if the aim were instead to 56 57 58 59 60 26 Page 27 of 93 Global Ecology and Biogeography

1 2 3 598 protect the n best sites for the species dictated by the available budget, a different 4 5 599 threshold would be applied. 6 7 8 600 Future prospects and recent developments 9 10 11 601 Due to data availability, PB methods are frequently used despite their limitations. 12 13 602 Future efforts should be directed towards the collection of more informative survey 14 15 603 data, which will lead to SDMs that can support a wider range of applications in 16 17 604 ecology, biogeography and conservation. A first step is to ensure that non-detections 18 For Peer Review 19 20 605 are recorded. Wider incorporation of non-detections into databases that bring together 21 22 606 observations from different sources will render them much more useful, as this 23 24 607 addresses the problem of sampling bias (Feeley & Silman, 2011) and allows the 25 26 608 estimation of species prevalence. A greater appreciation of the limitations of PB data 27 28 609 should also lead to careful investigation of existing datasets, to assess whether 29 30 31 610 absence data can actually be extracted from raw survey data. Where PB data do not 32 33 611 deliver the information required, they might still be usefully integrated with limited 34 35 612 PA data to estimate distributions better than simply using PA data alone (Dorazio, 36 37 613 2014; Fithian et al. , in press). 38 39 40 614 Collecting data in ways that allow dealing with imperfect detection should also be an 41 42 43 615 important component in designing future sampling protocols, while available data 44 45 616 sources should be carefully examined to evaluate whether they allow doing so (e.g. 46 47 617 Kéry et al. , 2010). Statistical methods that explicitly model the observation process 48 49 618 (i.e. state-space models) are valuable tools for analyzing ecological data (Guillera- 50 51 52 619 Arroita et al. , 2014b; King, 2014). Apart from the development of models of species 53 54 620 distributions that account for possible false absence records, recent statistical 55 56 621 advances include extensions to model range dynamics and multiple occupancy states 57 58 59 60 27 Global Ecology and Biogeography Page 28 of 93

1 2 3 622 (Bailey et al. , in press). Methods have also been developed to deal with false positive 4 5 623 records which, although less prevalent, can be relevant for some types of surveys 6 7 624 (Miller et al. , 2011). In summary, to reliably infer the distribution of a species, steps 8 9 10 625 need to be taken during survey design, data collection and analysis to minimize the 11 12 626 effects of the sampling process. 13 14 15 627 Conclusion 16 17 628 Many issues need to be considered when attempting to build useful ecological 18 For Peer Review 19 20 629 models. As we have shown, a fundamental question in the context of SDM is whether 21 22 630 the type and quality of data available is suitable to produce the information required 23 24 631 for a given application. Not all data types or SDM approaches are valid for all 25 26 632 applications in ecology, biogeography and conservation, and using inadequate SDM 27 28 633 outputs may lead to misguided decisions and suboptimal use of resources. For some 29 30 31 634 applications, learning about which variables are good predictors of a species 32 33 635 distribution is sufficient; for many others, however, knowledge about the prevalence 34 35 636 of the species is required. Our results demonstrate that binary conversion of SDM 36 37 637 outputs does not solve this problem, and should only be carried out when clearly 38 39 40 638 justified by the application’s objective. Framing the use of SDMs within a structured 41 42 639 decision-making context helps to ensure that the resulting model is fit for its intended 43 44 640 purpose. 45 46 47 641 ACKNOWLEDGEMENTS 48 49 642 This work was supported by the Australian Research Council (ARC) Centre of 50 51 52 643 Excellence for Environmental Decisions, the National Environment Research 53 54 644 Program (NERP) Environmental Decisions Hub, and ARC Future Fellowships to JE, 55 56 645 MM and BW. The authors thank Laura Pollock for helpful discussion regarding 57 58 59 60 28 Page 29 of 93 Global Ecology and Biogeography

1 2 3 646 species distribution modelling for phylogenetic studies, and Janet Franklin, an 4 5 647 anonymous reviewer, and the editors for comments that improved the quality of the 6 7 648 manuscript. 8 9 10 11 12 13 14 15 16 17 18 For Peer Review 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 29 Global Ecology and Biogeography Page 30 of 93

1 2 3 649 REFERENCES 4 5 650 Akcakaya, H.R., Mccarthy, M.A. & Pearce, J.L. (1995) Linking landscape data with 6 651 population viability analysis - Management options for the Helmeted 7 652 Honeyeater Lichenostomus melanops cassidix . Biological Conservation , 73 , 8 9 653 169-176. 10 654 Aranda, S.C. & Lobo, J.M. (2011) How well does presence-only-based species 11 655 distribution modelling predict assemblage diversity? A case study of the 12 656 Tenerife flora. Ecography , 34 , 31-38. 13 657 Bailey, L.L., MacKenzie, D.I. & Nichols, J.D. (in press) Advances and applications of 14 658 occupancy models. Methods in Ecology and Evolution , doi: 10.1111/2041- 15 659 210X.12100 16 660 Bekessy, S.A., Wintle, B.A., Gordon, A., Fox, J.C., Chisholm, R., Brown, B., Regan, 17 661 T., Mooney, N., Read, S.M. & Burgman, M.A. (2009) Modelling human 18 For Peer Review 19 662 impacts on the Tasmanian wedge-tailed eagle ( Aquila audax fleayi ). 20 663 Biological Conservation , 142 , 2438-2448. 21 664 Booth, T.H., Nix, H.A., Busby, J.R. & Hutchinson, M.F. (2014) Bioclim: the first 22 665 species distribution modelling package, its early applications and relevance to 23 666 most current MaxEnt studies. Diversity and Distributions , 20 , 1-9. 24 667 Busby, J.R. (1991) BIOCLIM - a bioclimate analysis and prediction system. Plant 25 668 Protection Quaterly , 6, 8-9. 26 669 Calabrese, J.M., Certain, G., Kraan, C. & Dormann, C.F. (2014) Stacking species 27 670 distribution models and adjusting bias by linking them to macroecological 28 29 671 models. Global Ecology and Biogeography , 23 , 99-112. 30 672 Camm, J.D., Polasky, S., Solow, A. & Csuti, B. (1996) A note on optimal algorithms 31 673 for reserve site selection. Biological Conservation , 78 , 353-355. 32 674 Chakraborty, A., Gelfand, A.E., Wilson, A.M., Latimer, A.M. & Silander, J.A. (2011) 33 675 Point pattern modelling for degraded presence-only data over large regions. 34 676 Journal of the Royal Statistical Society Series C-Applied Statistics , 60 , 757- 35 677 776. 36 678 Chen, G., Kéry, M., Plattner, M., Ma, K. & Gardner, B. (2013) Imperfect detection is 37 679 the rule rather than the exception in plant distribution studies. Journal of 38 39 680 Ecology , 101 , 183-191. 40 681 Cocks, K.D. & Baird, I.A. (1989) Using mathematical-programming to address the 41 682 multiple reserve selection problem - an example from the Eyre Peninsula, 42 683 South-Australia. Biological Conservation , 49 , 113-130. 43 684 Dorazio, R.M. (2012) Predicting the geographic distribution of a species from 44 685 presence-only data subject to detection errors. Biometrics , 68 , 1303-1312. 45 686 Dorazio, R.M. (2014) Accounting for imperfect detection and survey bias in statistical 46 687 analysis of presence-only data. Global Ecology and Biogeography , doi: 47 688 10.1111/geb.12216. 48 49 689 Elith, J. & Franklin, J. (2013) Species distribution modeling. Encyclopedia of 50 690 Biodiversity (ed. by S.A. Levin), pp. 692-705. Academic Press, Waltham, MA. 51 691 Elith, J., Phillips, S.J., Hastie, T., Dudik, M., Chee, Y.E. & Yates, C.J. (2011) A 52 692 statistical explanation of MaxEnt for ecologists. Diversity and Distributions , 53 693 17 , 43-57. 54 694 Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S., Guisan, A., Hijmans, 55 695 R.J., Huettmann, F., Leathwick, J.R., Lehmann, A., Li, J., Lohmann, L.G., 56 696 Loiselle, B.A., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., 57 697 Overton, J.M., Peterson, A.T., Phillips, S.J., Richardson, K., Scachetti-Pereira, 58 59 60 30 Page 31 of 93 Global Ecology and Biogeography

1 2 3 698 R., Schapire, R.E., Soberon, J., Williams, S., Wisz, M.S. & Zimmermann, 4 699 N.E. (2006) Novel methods improve prediction of species' distributions from 5 700 occurrence data. Ecography , 29 , 129-151. 6 701 Feeley, K.J. & Silman, M.R. (2011) Keep collecting: accurate species distribution 7 702 modelling requires more collections than previously thought. Diversity and 8 9 703 Distributions , 17 , 1132-1140. 10 704 Ferrier, S. & Watson, G. (1997) An evaluation of the effectiveness of environmental 11 705 surrogates and modelling techniques in predicting the distribution of 12 706 biological diversity. Environment Australia. Canberra. 184pp. 13 707 Ferrier, S. & Guisan, A. (2006) Spatial modelling of biodiversity at the community 14 708 level. Journal of Applied Ecology , 43 , 393-404. 15 709 Ferrier, S., Watson, G., Pearce, J. & Drielsma, M. (2002) Extended statistical 16 710 approaches to modelling spatial pattern in biodiversity in northeast New South 17 711 Wales. I. Species-level modelling. Biodiversity and Conservation , 11 , 2275- 18 For Peer Review 19 712 2307. 20 713 Fithian, W. & Hastie, T. (2013) Finite-sample equivalence in statistical models for 21 714 presence-only data. Annals of Applied Statistics , in press 22 715 Fithian, W., Elith, J., Hastie, T. & Keith, D. (in press) Bias Correction in Species 23 716 Distribution Models: Pooling Survey and Collection Data for Multiple 24 717 Species. Methods in Ecology and Evolution , 25 718 Franklin, J. (2013) Species distribution models in conservation biogeography: 26 719 developments and challenges. Diversity and Distributions , 19 , 1217-1223. 27 720 Garrard, G.E., Bekessy, S.A., McCarthy, M.A. & Wintle, B.A. (2008) When have we 28 29 721 looked hard enough? A novel method for setting minimum survey effort 30 722 protocols for flora surveys. Austral Ecology , 33 , 986-998. 31 723 Gu, W. & Swihart, R.H. (2004) Absent or undetected? Effects of non-detection of 32 724 species occurrence on wildlife–habitat models. Biological Conservation , 116 , 33 725 195-203. 34 726 Guillera-Arroita, G. & Lahoz-Monfort, J.J. (2012) Designing studies to detect 35 727 differences in species occupancy: power analysis under imperfect detection. 36 728 Methods in Ecology and Evolution , 3, 860-869. 37 729 Guillera-Arroita, G., Hauser, C.E. & McCarthy, M.A. (2014a) Optimal surveillance 38 39 730 strategy for invasive species management when surveys stop after detection. 40 731 Ecology and Evolution , 4, 1751-1760. 41 732 Guillera-Arroita, G., Lahoz-Monfort, J.J. & Elith, J. (in press) Maxent is not a 42 733 presence-absence method: a comment on Thibaud et al. Methods in Ecology 43 734 and Evolution , 44 735 Guillera-Arroita, G., Morgan, B.J.T., Ridout, M.S. & Linkie, M. (2011) Species 45 736 occupancy modeling for detection data collected along a transect. Journal of 46 737 Agricultural, Biological, and Environmental Statistics , 16 , 301-317. 47 738 Guillera-Arroita, G., Lahoz-Monfort, J.J., MacKenzie, D.I., Wintle, B.A. & 48 49 739 McCarthy, M.A. (2014b) Ignoring imperfect detection in biological surveys is 50 740 dangerous: a response to ‘Fitting and Interpreting Occupancy Models’. PLoS 51 741 ONE , 9, e99571. 52 742 Guisan, A. & Thuiller, W. (2005) Predicting species distribution: offering more than 53 743 simple habitat models. Ecology Letters , 8, 993-1009. 54 744 Guisan, A., Tingley, R., Baumgartner, J.B., Naujokaitis-Lewis, I., Sutcliffe, P.R., 55 745 Tulloch, A.I.T., Regan, T.J., Brotons, L., McDonald-Madden, E., Mantyka- 56 746 Pringle, C., Martin, T.G., Rhodes, J.R., Maggini, R., Setterfield, S.A., Elith, J., 57 747 Schwartz, M.W., Wintle, B.A., Broennimann, O., Austin, M., Ferrier, S., 58 59 60 31 Global Ecology and Biogeography Page 32 of 93

1 2 3 748 Kearney, M.R., Possingham, H.P. & Buckley, Y.M. (2013) Predicting species 4 749 distributions for conservation decisions. Ecology Letters , 16 , 1424-1435. 5 750 Hastie, T. & Fithian, W. (2013) Inference from presence-only data; the ongoing 6 751 controversy. Ecography , 36 , 864-867. 7 752 Hauser, C.E. & McCarthy, M.A. (2009) Streamlining 'search and destroy': cost- 8 9 753 effective surveillance for invasive species management. Ecology Letters , 12 , 10 754 683-692. 11 755 Herbert, K., Curran, I., Hauser, C., Hester, S., Kirkwood, A., Derrick, B. & King, D. 12 756 (2013) All eyes focussed on hawkweed eradication in the Victorian Alps - a 13 757 partnerships approach. Proceedings of the 17th NSW Weeds Conference , 14 758 Corowa, Australia. 15 759 Hof, J.G. & Raphael, M.G. (1993) Some mathematical-programming approaches for 16 760 optimizing timber age-class distributions to meet multispecies wildlife 17 761 population objectives. Canadian Journal of Forest Research-Revue 18 For Peer Review 19 762 Canadienne De Recherche Forestiere , 23 , 828-834. 20 763 IUCN (2012) IUCN Red List Categories and Criteria: Version 3.1. Second edition. , 21 764 Gland, Switzerland and Cambridge, UK: IUCN. iv + 32pp. 22 765 Jiménez-Valverde, A. & Lobo, J.M. (2007) Threshold criteria for conversion of 23 766 probability of species presence to either–or presence–absence. Acta 24 767 Oecologica , 31 , 361-369. 25 768 Keith, D.A., Elith, J. & Simpson, C.C. (2014) Predicting distribution changes of a 26 769 mire ecosystem under future climates. Diversity and Distributions , 20 , 440- 27 770 454. 28 29 771 Keith, D.A., Akcakaya, H.R., Thuiller, W., Midgley, G.F., Pearson, R.G., Phillips, 30 772 S.J., Regan, H.M., Araujo, M.B. & Rebelo, T.G. (2008) Predicting extinction 31 773 risks under climate change: coupling stochastic population models with 32 774 dynamic bioclimatic habitat models. Biology Letters , 4, 560-563. 33 775 Kéry, M. (2002) Inferring the absence of a species: a case study of snakes. Journal of 34 776 Wildlife Management , 66, 330-338. 35 777 Kéry, M. (2011) Towards the modelling of true species distributions. Journal of 36 778 Biogeography , 38 , 617-618. 37 779 Kéry, M., Gardner, B. & Monnerat, C. (2010) Predicting species distributions from 38 39 780 checklist data using site-occupancy models. Journal of Biogeography , 37 , 40 781 1851-1862. 41 782 Kéry, M., Guillera-Arroita, G. & Lahoz-Monfort, J.J. (2013) Analysing and mapping 42 783 species range dynamics using occupancy models. Journal of Biogeography , 43 784 40 , 1463-1474. 44 785 King, R. (2014) Statistical Ecology. Annual Review of Statistics and its Application , 45 786 401-426. 46 787 Lahoz-Monfort, J.J., Guillera-Arroita, G. & Wintle, B.A. (2014) Imperfect detection 47 788 impacts the performance of species distribution models. Global Ecology and 48 49 789 Biogeography , 23 , 504-515. 50 790 Lancaster, T. & Imbens, G. (1996) Case-control studies with contaminated controls. 51 791 Journal of Econometrics , 71 , 145-160. 52 792 Lawson, C.R., Hodgson, J.A., Wilson, R.J. & Richards, S.A. (2014) Prevalence, 53 793 thresholds and the performance of presence–absence models. Methods in 54 794 Ecology and Evolution , 5, 54-64. 55 795 Lele, S.R. & Keim, J.L. (2006) Weighted distributions and estimation of resource 56 796 selection probability functions. Ecology , 87 , 3021-3028. 57 58 59 60 32 Page 33 of 93 Global Ecology and Biogeography

1 2 3 797 Lele, S.R., Merrill, E.H., Keim, J. & Boyce, M.S. (2013) Selection, use, choice and 4 798 occupancy: clarifying concepts in resource selection studies. Journal of 5 799 Ecology , 82 , 1183-1191. 6 800 Li, W. & Guo, Q. (2013) How to assess the prediction accuracy of species presence– 7 801 absence models without absence data? Ecography , 36 , 788-799. 8 9 802 Liu, C., Berry, P.M., Dawson, T.P. & Pearson, R.G. (2005) Selecting thresholds of 10 803 occurrence in the prediction of species distributions. Ecography , 28 , 385-393. 11 804 Liu, C.R., White, M. & Newell, G. (2013) Selecting thresholds for the prediction of 12 805 species occurrence with presence-only data. Journal of Biogeography , 40 , 13 806 778-789. 14 807 Lizzio, J., Richmond, L., Mewett, O., Hennecke, B., Baker, J. & Raphael, B. (2009) 15 808 Methodology to prioritise Weeds of National Significance (WoNS) 16 809 candidates. In. Bureau of Rural Sciences, Canberra. 17 810 MacKenzie, D.I. & Royle, J.A. (2005) Designing occupancy studies: general advice 18 For Peer Review 19 811 and allocating survey effort. Journal of Applied Ecology , 42 , 1105-1114. 20 812 MacKenzie, D.I., Nichols, J.D., Lachman, G.B., Droege, S., Royle, J.A. & Langtimm, 21 813 C.A. (2002) Estimating site occupancy rates when detection probabilities are 22 814 less than one. Ecology , 83 , 2248-2255. 23 815 MacKenzie, D.I., Nichols, J.D., Royle, J.A., Pollock, K.H., Bailey, L.L. & Hines, J.E. 24 816 (2006) Occupancy Estimation and Modeling: Inferring Patterns and 25 817 Dynamics of Species Occurrence . Academic Press, New York. 26 818 Merow, C. & Silander, J.A. (2014) A comparison of Maxlike and Maxent for 27 819 modelling species distributions. Methods in Ecology and Evolution , 5, 215- 28 29 820 225. 30 821 Merow, C., Smith, M.J. & Silander, J.A. (2013) A practical guide to MaxEnt for 31 822 modeling species' distributions: what it does, and why inputs and settings 32 823 matter. Ecography , 36 , 1058-1069. 33 824 Miller, D.A., Nichols, J.D., McClintock, B.T., Campbell Grant, E.H., Bailey, L.L. & 34 825 Weir, L.A. (2011) Improving occupancy estimation when two types of 35 826 observation error occur: non-detection and species misidentification. Ecology , 36 827 92 , 1422-1428. 37 828 Moilanen, A., Wilson, K.A. & Possingham, H.P. (2009) Spatial Conservation 38 39 829 Prioritization. Quantitative Methods & Computational Tools. Oxford 40 830 University Press, New York. 41 831 Moilanen, A., Franco, A.M.A., Early, R.I., Fox, R., Wintle, B. & Thomas, C.D. 42 832 (2005) Prioritizing multiple-use landscapes for conservation: methods for 43 833 large multi-species planning problems. Proceedings of the Royal Society B- 44 834 Biological Sciences , 272 , 1885-1891. 45 835 Moilanen, A., Meller, L., Leppänen, J., Montesino Pouzols, F., Arponen, A. & Kujala, 46 836 H. (2012) Zonation: Spatial conservation planning framework and software 47 837 version 3.1 user manual. Helsingin Yliopisto, Helsinki. 48 49 838 Newbold, T. (2010) Applications and limitations of museum data for conservation 50 839 and ecology, with particular attention to species distribution models. Progress 51 840 in Physical Geography , 34 , 3-22. 52 841 Newbold, T., Gilbert, F., Zalat, S., El-Gabbas, A. & Reader, T. (2009) Climate-based 53 842 models of spatial patterns of species richness in Egypt's butterfly and mammal 54 843 fauna. Journal of Biogeography , 36 , 2085-2095. 55 844 Parviainen, M., Marmion, M., Luoto, M., Thuiller, W. & Heikkinen, R.K. (2009) 56 845 Using summed individual species models and state-of-the-art modelling 57 58 59 60 33 Global Ecology and Biogeography Page 34 of 93

1 2 3 846 techniques to identify threatened plant species hotspots. Biological 4 847 Conservation , 142 , 2501-2509. 5 848 Pearce, J. & Lindenmayer, D. (1998) Bioclimatic analysis to enhance reintroduction 6 849 biology of the endangered helmeted honeyeater (Lichenostomus melanops 7 850 cassidix) in southeastern Australia. Restoration Ecology , 6, 238-243. 8 9 851 Pearce, J. & Ferrier, S. (2000) Evaluating the predictive performance of habitat 10 852 models developed using logistic regression. Ecological Modelling , 133 , 225- 11 853 245. 12 854 Phillips, S.J. & Dudík, M. (2008) Modeling of species distributions with Maxent: new 13 855 extensions and a comprehensive evaluation. Ecography , 31 , 161-175. 14 856 Phillips, S.J. & Elith, J. (2013) On estimating probability of presence from use- 15 857 availability or presence-background data. Ecology , 94 , 1409-1419. 16 858 Phillips, S.J., Anderson, R.P. & Schapire, R.E. (2006) Maximum entropy modelling 17 859 of species geographic distributions. Ecological Modelling , 190 , 231–259. 18 For Peer Review 19 860 Phillips, S.J., Dudik, M., Elith, J., Graham, C.H., Lehmann, A., Leathwick, J. & 20 861 Ferrier, S. (2009) Sample selection bias and presence-only distribution 21 862 models: implications for background and pseudo-absence data. Ecological 22 863 Applications , 19 , 181-197. 23 864 Pineda, E. & Lobo, J.M. (2009) Assessing the accuracy of species distribution models 24 865 to predict amphibian species richness patterns. Journal of Animal Ecology , 78 , 25 866 182-190. 26 867 Renner, I.W. & Warton, D.I. (2013) Equivalence of MAXENT and Poisson Point 27 868 Process Models for Species Distribution Modeling in Ecology. Biometrics , 69 , 28 29 869 274-281. 30 870 Royle, J.A., Chandler, R.B., Yackulic, C. & Nichols, J.D. (2012) Likelihood analysis 31 871 of species occurrence probability from presence-only data for modelling 32 872 species distributions. Methods in Ecology and Evolution , 3, 545-554. 33 873 Schmidt-Lebuhn, A.N., Knerr, N.J. & Gonzalez-Orozco, C.E. (2012) Distorted 34 874 perception of the spatial distribution of plant diversity through uneven 35 875 collecting efforts: the example of in Australia. Journal of 36 876 Biogeography , 39 , 2072-2080. 37 877 Stauffer, H.B., Ralph, C.J. & Miller, S.L. (2002) Incorporating detection uncertainty 38 39 878 into presence-absence surveys for marbled murrelet. Predicting Species 40 879 Occurrences: Issues of Accuracy and Scale , 357-365. 41 880 Trotta-Moreu, N. & Lobo, J.M. (2010) Deriving the species richness distribution of 42 881 Geotrupinae (Coleoptera: Scarabaeoidea) in Mexico from the overlap of 43 882 individual model predictions. Environmental Entomology , 39 , 42-49. 44 883 Tyre, A.J., Tenhumberg, B., Field, S.A., Niejalke, D., Parris, K. & Possingham, H.P. 45 884 (2003) Improving precision and reducing bias in biological surveys: 46 885 estimating false-negative error rates. Ecological Applications , 13 , 1790-1801. 47 886 Warton, D.I. & Shepherd, L.C. (2010) Poisson point process models solve the 48 49 887 "pseudo-absence problem" for presence-only data in ecology. Annals of 50 888 Applied Statistics , 4, 1383-1402. 51 889 Warton, D.I., Renner, I.W. & Ramp, D. (2013) Model-based control of observer bias 52 890 for the analysis of presence-only data in ecology. Plos One , 8, e79168. 53 891 Wintle, B.A., Elith, J. & Potts, J.M. (2005a) Fauna habitat modelling and mapping: A 54 892 review and case study in the Lower Hunter Central Coast region of NSW. 55 893 Austral Ecology , 30 , 719-738. 56 57 58 59 60 34 Page 35 of 93 Global Ecology and Biogeography

1 2 3 894 Wintle, B.A., Kavanagh, R.P., McCarthy, M.A. & Burgman, M.A. (2005b) 4 895 Estimating and dealing with detectability in occupancy surveys for forest owls 5 896 and arboreal marsupials. Journal of Wildlife Management , 69 , 905-917. 6 897 Yackulic, C.B., Chandler, R., Zipkin, E.F., Royle, J.A., Nichols, J.D., Grant, E.H.C. 7 898 & Veran, S. (2013) Presence-only modelling using MAXENT: when can we 8 9 899 trust the inferences? Methods in Ecology and Evolution , 4, 236-243. 10 900 Yoccoz, N.G., Nichols, J.D. & Boulinier, T. (2001) Monitoring of biological diversity 11 901 in space and time. Trends in Ecology & Evolution , 16 , 446-453. 12 902 13 903 Additional references to reviewed papers for the literature review and assessment of 14 15 904 applications can be found at the end of Appendices S1 and S3 at [insert URL here] . 16 17 905 18 For Peer Review 19 20 906 BIOSKETCH 21 22 907 The authors belong to the ARC Centre of Excellence for Environmental Decisions 23 24 908 (www.ceed.edu.au). They share an interest in the construction of species distribution 25 26 909 models and their application to inform environmental decisions, as well as in research 27 28 29 910 on methodological aspects of such models. Their collective experience covers a wide 30 31 911 range of SDM applications such as wildlife monitoring, invasive species 32 33 912 management, population viability analysis and spatial conservation planning, as well 34 35 913 as environmental decision analysis more generally. 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 35 Global Ecology and Biogeography Page 36 of 93

1 2 3 Table 1 Adequacy of SDM data type and impact of binary conversion for the set of 4 5 applications chosen as case studies. For a detailed assessment of a wider range of applications 6 see Appendix S3. 7 8 Has binary Consequences of 9 S/ Consequences of Is relative likelihood / Application Domain classification using a binary M biased estimation ranking ok? 10 been used? classification 11 12 Calculate AOO or Management S Species status may be Relative likelihoods & Yes Degrades AOO compare temporal of invasive incorrectly assessed rankings not able to estimate estimation, even when 13 changes in AOO, to or threatened due to biased AOO, and not sufficient probabilities are 14 monitor or assess species estimation of AOO or input to track temporal estimated; It does not status of a species. its temporal trends trends (as not comparable solve the inadequacy 15 (Case study 1) (real trend missed or across time; except for PA of relative likelihoods 16 spurious trend data when detectability or rankings. 17 detected). constant in space and time). 18 For Peer Review Compare potential Management M Management effort Relative likelihoods and Yes Degrades AOO 19 AOO for a list of of invasive may be allocated rankings are not sufficient estimation, even when 20 species to prioritize species toward species that inputs because they are not probabilities are 21 management resources actually have low comparable across species. estimated; It does not across species. potential for spread solve the inadequacy 22 (Case study 2) and vice versa. of relative likelihoods 23 or rankings. 24 25 Determine optimal Management S Overall costs will Computing expected costs No Not valid. Method surveillance to of invasive increase, as the level requires probabilities; can based on 26 minimize monitoring species of surveillance result in suboptimal solution acknowledging 27 plus management guiding early otherwise. uncertainty in species 28 costs. management is not presence to evaluate (Case study 3) optimal. trade-offs in expected 29 costs. 30 31 Estimate species Spatial M Areas where the Relative likelihood not Yes Degrades richness richness (by stacking planning; likelihood of sufficient to estimate estimation, even when 32 single-species SDMs) Ecological/ observing species is absolute richness. Relative probabilities are 33 to identify biodiversity Biogeogr. highest (rather than values may lead to robust estimated, and does 34 hotspots for inference the best habitats) may site rankings in terms of not solve the prioritization, or for be incorrectly richness in some scenarios inadequacy of relative 35 ecological or identified as when comparing sites within likelihoods or 36 biogeographical biodiversity hotspots. the same modeled region. rankings. 37 inference. (Case study 4) 38 39 40 Spatial prioritization: Spatial M Can lead to an under- Relative likelihood is fine if Yes Resulting information SDM as input layer for planning performing targets are set in terms of the loss may lead to an 41 evaluating the prioritization, with % of distribution to be under-performing 42 constraint in a some targets not met retained, but not if set as prioritization. Effect 43 ‘minimum set’ and/or solution absolute areas. A ranking more pronounced approach, e.g. Marxan. involving greater cost that does not capture the when targets are set as 44 (Case study 5) than needed. correct shape can lead to an absolute areas (as 45 Prioritization will under-performing opposed to % of favor sites where the prioritization. distributions). 46 likelihood of 47 observing species is 48 highest, rather than the best sites. 49 50 51 “AOO” refers to “area of occupancy”. S/M indicates single or multiple species (this refers to whether the calculations involve SDMs from several species, e.g. by combining or prioritizing them; we do not consider as multi-species those applications 52 where, although several species may be considered, calculations are independent for each of them). 53 54 55 56 57 58 59 60 36 Page 37 of 93 Global Ecology and Biogeography

1 2 3 1 FIGURE LEGENDS 4 5 6 2 Figure 1 (a) Synthesis of how the type of survey data interacts with sampling bias and 7 8 3 imperfect detection to determine what a correlative species distribution model can 9 10 11 4 estimate. Dark arrows denote the default level of information that can be achieved 12 13 5 with each type of survey data (PA: presence-absence, PB: presence-background and 14 15 6 DET: occupancy-detection). Light arrows indicate under which conditions higher 16 17 7 levels of information can be achieved from those data types. denotes the probability 18 For Peer Review 19 ∗ 20 8 of species occurrence at a site, and the probability of detecting the species at a site 21 22 9 where present (given all the survey effort applied per site). Column 1 refers to the 23 24 10 probability/likelihood of observing the species. Columns 2-4 refer to information 25 26 11 about species occurrence (i.e. distribution). What we represent for convenience as 27 28 12 distinct quantities (columns) correspond in practice to a gradation (e.g. a bias may be 29 30 31 13 negligibly small). (b) Graphical example of the different levels of information 32 33 14 represented by each column in panel (a), assuming a single environmental covariate 34 35 15 for simplicity. Taking the probability of occurrence in (4) as a reference, the relative 36 37 16 likelihood of occurrence (3) is proportional to it, while (2) is not proportional but 38 39 40 17 would rank sites in the same order. The relative likelihood or probability of 41 42 18 observation (1) may not even provide a good ranking in terms of occurrence. 43 44 19 Although for simplicity we present this figure from the point of view of the estimation 45 46 20 of probabilities of species occurrence at sites, the general ideas apply as well when 47 48 49 21 the focus of estimation is the intensity of points (observations) in the landscape (i.e. 50 51 22 from point process models). Only the terminology is slightly different; for instance, 52 53 23 instead of probability or relative likelihood of occurrence, we would refer to 54 55 24 absolute/relative intensity of species occurrence. 56 57 58 59 60 37 Global Ecology and Biogeography Page 38 of 93

1 2 3 25 Figure 2 Wildlife monitoring case study. The rows show the true and estimated area 4 5 26 of occupancy (AOO, %) for a virtual species as a function of time: row 1 = true AOO; 6 7 27 row 2 = AOO derived from a presence-absence (PA) model; row 3 = AOO derived 8 9 10 28 from a presence-background (PB) model. In column 1 the red horizontal lines report 11 th th 12 29 true AOO, and each bar summarizes the result of 20 simulations (10 -90 13 14 30 percentiles), with colors representing how the species distribution model (SDM) 15 16 31 output was used (black: no threshold, blue: ‘SS=SP’ threshold, green: ‘max(SS+SP)’ 17 18 For Peer Review 19 32 threshold; see threshold definitions in the main text and similar results for other 20 21 33 thresholds in Appendix S4). Columns 2-4 show the estimated distributions for one of 22 23 34 the simulations for the three time steps (t1, t2 and t3), as well as the “true” 24 25 35 distribution model. PB SDMs are not able to track the species decline. Using 26 27 36 thresholds does not solve the problem, and is detrimental for estimation based on PA 28 29 30 37 data. 31 32 33 38 Figure 3 Invasive species prioritization case study. Estimated area of occupancy 34 35 39 (AOO, %) for 25 virtual species as a function of their true AOO, based on the output 36 37 40 of presence-absence (PA) and presence-background (PB) species distribution models 38 39 40 41 (SDMs; first and second row, respectively). In the first column, the SDM output is 41 42 42 used without a threshold. In the second and third columns the SDM output is first 43 44 43 converted to a binary output using a threshold (‘SS=SP’ or ‘max(SS+SP)’; see 45 46 44 threshold definitions in the main text and similar results for other thresholds in 47 48 45 Appendix S4). The PB SDM does not prioritize species correctly. Binary conversion 49 50 51 46 did not solve the problem, and was detrimental for estimation based on PA data. 52 53 54 47 Figure 4 Optimal surveillance of invasive species case study. Rows correspond to 55 56 48 three scenarios of species occurrence information used to determine the optimal 57 58 59 60 38 Page 39 of 93 Global Ecology and Biogeography

1 2 3 49 amount of surveillance: row 1 = “true” occurrence probabilities; row 2 = species 4 5 50 distribution model (SDM) obtained from presence-absence (PA) data; row 3 = SDM 6 7 51 obtained from presence-background (PB) data. The first column displays the species 8 9 10 52 distribution model (true/estimated occurrence probability or estimated likelihood of 11 12 53 occurrence; note the difference in scale). The second column displays the optimal 13 14 54 survey length at each site, determined by taking the maps in the first row as true 15 16 55 occurrence probabilities. The third column displays the ratio of total costs (survey + 17 18 For Peer Review 19 56 management) between the case with surveys as determined above and a case with no 20 21 57 surveys. Blue indicates that surveys lead to total cost savings, while red indicates that 22 23 58 surveying increases total costs. Applying the optimal level of surveillance in this 24 25 59 example would reduce total costs to about 74%. Applying the level of surveillance 26 27 60 obtained by wrongly interpreting the output of the PB SDM as a probability would 28 29 30 61 lead to an increase in total costs (124%). 31 32 33 62 Figure 5 Species richness case study. Plots display estimated versus true richness for 34 35 63 500 sites, based on stacking presence-absence (PA) and presence-background (PB) 36 37 64 species distribution models (SDMs; first and second row, respectively). A total of 100 38 39 40 65 species were simulated. In the first column, estimated richness is obtained as the sum 41 42 66 of the continuous output of the SDMs, without a threshold. In the second and third 43 44 67 columns the SDM output is first converted to a binary output using a threshold 45 46 68 (‘SS=SP’ or ‘max(SS+SP)’; see threshold definitions in the main text and similar 47 48 69 results for other thresholds in Appendix S4). The sum of continuous outputs of PA 49 50 51 70 SDMs provides a good estimation of species richness; PB SDMs cannot estimate 52 53 71 richness well given that their output does not represent a probability of presence. 54 55 72 Applying thresholds does not solve the problem and is detrimental when using PA 56 57 73 data. Despite not being able to estimate absolute richness, in this example PB SDMs 58 59 60 39 Global Ecology and Biogeography Page 40 of 93

1 2 3 74 provide a relatively robust ranking of sites in terms of richness (although worse than 4 5 75 PA SDMs). 6 7 8 76 Figure 6 Spatial prioritization case study. Scatterplots represent the proportion of 9 10 11 77 occupied area selected by Marxan (y-axis) for the seven considered species with 12 13 78 respect to the requested target (x-axis) in different scenarios. In all panels the same 14 15 79 targets are applied, although they are expressed in two ways: as absolute areas (left 16 17 80 panels) or as a proportion of area (right panels). Closed circles represent solutions 18 For Peer Review 19 20 81 produced for the original SDM probabilities, open circles for scaled probabilities, and 21 22 82 other symbols represent solutions where thresholds were applied to the latter to create 23 24 83 binary SDM outputs: ‘MTP’ (), ‘10% TP’ (), ‘SS=SP’ (), ‘max(SS+SP)’ () 25 26 84 (see threshold definitions in the main text). The values displayed in the legend of each 27 28 85 plot represent the total cost for each prioritization solution. 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 40 Page 41 of 93 Global Ecology and Biogeography

1 2 3 Figures 4 5 6 Fig. 1 7 8 9 10 11 12 13 14 15 16 17 18 For Peer Review 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 41 Global Ecology and Biogeography Page 42 of 93

1 2 3 Fig. 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 For Peer Review 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 42 Page 43 of 93 Global Ecology and Biogeography

1 2 3 4 Fig. 3 5 6 7 continuous output binary output binary output 8 SS=SP max(SS+SP) 9 10 11 12 PA 13

14 estimated (%) AOO estimated (%) AOO estimated (%) AOO 15 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 16 17 true AOO (%) true AOO (%) true AOO (%) 18 For Peer Review 19 20 21 PB 22 23 estimated (%)AOO estimated (%)AOO estimated (%)AOO

24 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0

25 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 26 true AOO (%) true AOO (%) true AOO (%) 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 43 Global Ecology and Biogeography Page 44 of 93

1 2 3 4 5 Fig. 4 6 7 8 9 distribution model optimal survey length cost ratio 3.0 10 6 0.08 2.5 11 5 2.0 12 0.06 4 true 1.5 13 0.04 3 2 1.0 14 0.02 1 0.5 15 0.00 0 0.0 16

17 3.0 18 6 For Peer0.08 Review 2.5 5 19 2.0 0.06 4 20 PA 1.5 0.04 3 21 2 1.0 0.02 22 1 0.5 23 0.00 0 0.0 24 25 3.0 0.5 6 26 2.5 5 0.4 2.0 27 4 0.3 1.5 28 PB 3 0.2 29 2 1.0 30 0.1 1 0.5 31 0.0 0 0.0 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 44 Page 45 of 93 Global Ecology and Biogeography

1 2 3 Fig. 5 4 5 6 7 8 continuous output binary output binary output 9 SS=SP max(SS+SP) 10 11 12 13 PA 14

15 estimatedrichness estimatedrichness estimatedrichness 16 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 17 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 18 Fortrue richness Peer Reviewtrue richness true richness 19 20 21 22 23 PB 24 25

26 estimatedrichness estimatedrichness estimatedrichness 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 27 28 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 29 true richness true richness true richness 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 45 Global Ecology and Biogeography Page 46 of 93

1 2 3 4 Fig. 6 5 6 7 8 9 10 Absolute target Proportional target 11 (a) (b) 12 13 O 14 O O 15 OO 16 O O achieved (%) achieved (%) achieved 17 44093 44096 18 output Continuous 85684 44122 For0.0 0.2 0.4 0.6 0.8 1.0 Peer Review0.0 0.2 0.4 0.6 0.8 1.0 19 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 20 21 target (%) target (%) 22 (c) (d) 23 24 17771 21047 25 26702 26 26738 27 59971

28 (%) achieved (%) achieved 46099 Binary output 29 39396 38476

30 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

31 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 32 target (%) target (%) 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 46 Page 47 of 93 Global Ecology and Biogeography

1 2 3 86 SUPPORTING INFORMATION 4 5 87 Appendix S1: Literature review 6 7 8 88 Appendix S2: Additional figures 9 10 11 12 89 Appendix S3: Assessment of SDM output adequacy in ecological, biogeographical 13 14 90 and conservation applications 15 16 17 91 Appendix S4: Details about case studies 18 For Peer Review 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 47 Global Ecology and Biogeography Page 48 of 93

Matching SDMs to applications Appendix S1 – Literature Review 1 2 3 Appendix S1: Literature review 4 5 6 Methods 7 8 We searched ISI Web of Science for peer-reviewed journal articles spanning January 2008 - February 9 2014 using the topic terms ‘species distribution model’, ‘ecological niche model’ and ‘habitat model’. 10 11 We decided to include only studies published in the last five years, as we were interested in patterns of 12 current development and use of SDMs in ecology, biogeography and conservation, not historical 13 developments. This search produced 617 results, of which 100 papers were chosen at random to be 14 included in the review. Of these 100, we discarded papers which did not detail the development of an 15 16 SDM (and instead discussed SDM methodology more broadly (17 papers), or modelled population 17 dynamics), and replaced them with others which were randomly chosen from the remaining papers 18 and fit the criteria. A further nine papers developed an SDM, but did not use a statistical approach to 19 model occupancy with presence-absence (PA), presence-background (PB), presence-only (PO) or 20 For Peer Review 21 occupancy-detection (DET) data. These were tabulated, but not included in our summary statistics. 22 23 For each paper, we recorded the main purpose of the study, as well as the type of data used to fit the 24 SDM and the associated algorithm, and whether a threshold was applied to the final predictions to 25 create a binary presence/absence map or categories of suitability. Where a threshold had been applied, 26 we recorded how it had been set. Where more than one data type, threshold, or methodological 27 28 approach was presented in a paper, all were recorded. We also noted whether each study had involved 29 any field work or data collection, or if the data had been obtained from a single source and contained 30 known presences and absences, such as monitoring data collected by a government agency (e.g. 31 Wenger & Olden 2012, Swanson et al. 2013). We used this information to assess whether the authors 32 33 had presence-absence data at their disposal but chose to analyse it as presence-background instead. 34 35 Results 36 Of the 100 papers considered in our review, the most common motivation for building an SDM was to 37 38 develop or test modelling methodology (17 papers each, respectively; Table S1.1). Other themes 39 included informing the management of a threatened species (14 papers), predicting the potential 40 impacts of climate change (12 papers), and exploring phylogeographic patterns (9 papers). 41 42 A large proportion of studies had high-quality data at their disposal, as 67% had carried out some form 43 of data collection or collated PA data from a single source. However, PB data were most commonly 44 45 used to build SDMs (n = 53 studies). Presence-absence data were also commonly used (n = 47), 46 whereas use of PO data (n = 11) and DET data (n = 5) was relatively rare. A total of 36 different 47 modelling approaches were used to develop SDMs (Table S1.2), and 26 of the studies tested more 48 than one approach. Generalized regression models (GRMs), including generalised linear models 49 50 (GLMs), generalized additive models (GAMs), and their mixed-effects equivalents (GLMMs and 51 GAMMs) were the most frequently used (44 studies), followed by MaxEnt (36 studies). All other 52 algorithms were employed in fewer than 10 studies. All nine studies which used SDMs to explore 53 patterns of phylogeography analysed PB data solely with MaxEnt. 54 55 More than half (54%) of the papers applied some form of threshold to their SDMs. Reasons given for 56 57 doing so included the need to identify critical habitat, simplify subsequent analysis, create distribution 58 maps and evaluate the model. Across the studies, thresholds were set using 14 different approaches, 59 the most common being the minimum training presence (n =17), the threshold that resulted in equal 60 sensitivity and specificity (n = 11), that which maximized the sum of sensitivity and specificity (n =10), and that corresponding to the 10-percentile of training presence estimates (n = 9; Table S1.2).

Guillera-Arroita et al. - Global Ecology and Biogeography 1 Page 49 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S1 – Literature Review 1 2 3 References 4 5 Abdel-Dayem, M.S., Annajar, B.B., Hanafi, H.A. & Obenauer, P.J. (2012). The potential distribution of 6 Phlebotomus papatasi (Diptera: Psychodidae) in Libya based on ecological niche model. J. Med. 7 Entomol., 49, 739–745. 8 9 Altmoos, M. & Henle, K. (2010). Relevance of multiple spatial scales in habitat models: A case study 10 with amphibians and grasshoppers. Acta Oecologica-International J. Ecol., 36, 548–560. 11 12 Ball, L.C., Doherty, P.F., Ostermann-Kelm, S.D. & McDonald, M.W. (2010). Effects of rain on palm 13 springs ground squirrel occupancy in the Sonoran Desert. J. Wildl. Manage., 74, 954–962. 14 15 Banakar, V., de Magny, G.C., Jacobs, J., Murtugudde, R., Huq, A., Wood, R.J., et al. (2011). Temporal 16 and spatial variability in the distribution of Vibrio vulnificus in the Chesapeake Bay: a hindcast study. 17 Ecohealth, 8, 456–467. 18 19 20 Beans, C.M., Kilkenny, F.F.For & Galloway, Peer L.F. (2012). Review Climate suitability and human influences 21 combined explain the range expansion of an invasive horticultural plant. Biol. Invasions, 14, 2067–2078. 22 23 Beaugrand, G., Lenoir, S., Ibanez, F. & Mante, C. (2011). A new model to assess the probability of 24 occurrence of a species, based on presence-only data. Mar. Ecol. Prog. Ser., 424, 175–190. 25 26 Bellamy, C., Scott, C. & Altringham, J. (2013). Multiscale, presence-only habitat suitability models: 27 fine-resolution maps for eight bat species. J. Appl. Ecol., 50, 892–901. 28 29 Benito, B., Lorite, J. & Penas, J. (2011). Simulating potential effects of climatic warming on altitudinal 30 patterns of key species in Mediterranean-alpine ecosystems. Clim. Change, 108, 471–483. 31 32 Benito, B.M., Lorite, J., Perez-Perez, R., Gomez-Aparicio, L. & Penas, J. (2014). Forecasting plant range 33 collapse in a Mediterranean hotspot: when dispersal uncertainties matter. Divers. Distrib., 20, 72–83. 34 35 36 Bentlage, B., Peterson, A.T., Barve, N. & Cartwright, P. (2013). Plumbing the depths: extending 37 ecological niche modelling and species distribution modelling in three dimensions. Glob. Ecol. 38 Biogeogr., 22, 952–961. 39 40 Blair, M.E., Sterling, E.J., Dusch, M., Raxworthy, C.J. & Pearson, R.G. (2013). Ecological divergence 41 and speciation between lemur (Eulemur) sister species in Madagascar. J. Evol. Biol., 26, 1790–1801. 42 43 Bonthoux, S. & Balent, G. (2012). Point count duration: five minutes are usually sufficient to model the 44 distribution of bird species and to study the structure of communities for a French landscape. J. 45 Ornithol., 153, 491–504. 46 47 Boulangeat, I., Gravel, D. & Thuiller, W. (2012). Accounting for dispersal and biotic interactions to 48 disentangle the drivers of species distributions and their abundances. Ecol. Lett., 15, 584–593. 49 50 51 Buckley, L.B. (2010). The range implications of lizard traits in changing environments. Glob. Ecol. 52 Biogeogr., 19, 452–464. 53 54 Cao, Y., DeWalt, R.E., Robinson, J.L., Tweddale, T., Hinz, L. & Pessino, M. (2013). Using Maxent to 55 model the historic distributions of stonefly species in Illinois streams: The effects of regularization and 56 threshold selections. Ecol. Modell., 259, 30–39. 57 58 Carroll, C., Johnson, D.S., Dunk, J.R. & Zielinski, W.J. (2010). Hierarchical Bayesian spatial models for 59 multispecies conservation planning and monitoring. Conserv. Biol., 24, 1538–1548. 60 Chen, X.J., Tian, S.Q., Chen, Y. & Liu, B.L. (2010). A modeling approach to identify optimal habitat

Guillera-Arroita et al. - Global Ecology and Biogeography 2 Global Ecology and Biogeography Page 50 of 93

Matching SDMs to applications Appendix S1 – Literature Review 1 2 3 and suitable fishing grounds for neon flying squid (Ommastrephes bartramii) in the Northwest Pacific 4 Ocean. Fish. Bull., 108, 1–14. 5 6 Cianfrani, C., Le Lay, G., Hirzel, A.H. & Loy, A. (2010). Do habitat suitability models reliably predict 7 the recovery areas of threatened species? J. Appl. Ecol., 47, 421–430. 8 9 Colacicco-Mayhugh, M.G., Masuoka, P.M. & Grieco, J.P. (2010). Ecological niche model of 10 Phlebotomus alexandri and P. papatasi (Diptera: Psychodidae) in the Middle East. Int. J. Health Geogr., 11 12 9. 13 14 Crimmins, S.M., Dobrowski, S.Z. & Mynsberge, A.R. (2013). Evaluating ensemble forecasts of plant 15 species distributions under climate change. Ecol. Modell., 266, 126–130. 16 17 de Magny, G.C., Long, W., Brown, C.W., Hood, R.R., Huq, A., Murtugudde, R., et al. (2009). Predicting 18 the distribution of Vibrio spp. in the Chesapeake Bay: A Vibrio cholerae case study. Ecohealth, 6, 378– 19 389. 20 For Peer Review 21 Delahaye, L., Monticelli, D., Lehaire, F., Rondeux, J. & Claessens, H. (2010). Fine-scale habitat 22 selection by two specialist woodpeckers occurring in beech and oak-dominated forests in southern 23 Belgium. Ardeola, 57, 339–362. 24 25 Dobrowski, S.Z., Thorne, J.H., Greenberg, J.A., Safford, H.D., Mynsberge, A.R., Crimmins, S.M., et al. 26 27 (2011). Modeling plant ranges over 75 years of climate change in California, USA: temporal 28 transferability and species traits. Ecol. Monogr., 81, 241–257. 29 30 Doherty, K.E., Naugle, D.E., Walker, B.L. & Graham, J.M. (2008). Greater sage-grouse winter habitat 31 selection and energy development. J. Wildl. Manage., 72, 187–195. 32 33 Domisch, S., Jahnig, S.C. & Haase, P. (2011). Climate-change winners and losers: stream 34 macroinvertebrates of a submontane region in Central Europe. Freshw. Biol., 56, 2009–2020. 35 36 Druon, J.N., Panigada, S., David, L., Gannier, A., Mayol, P., Arcangeli, A., et al. (2012). Potential 37 feeding habitat of fin whales in the western Mediterranean Sea: an environmental niche model. Mar. 38 Ecol. Prog. Ser., 464, 289–U333. 39 40 Dunstan, P.K., Foster, S.D., Hui, F.K.C. & Warton, D.I. (2013). Finite mixture of regression modeling 41 for high-dimensional count and biomass data in ecology. J. Agric. Biol. Environ. Stat., 18, 357–375. 42 43 44 Ensing, D.J., Moffat, C.E. & Pither, J. (2013). Taxonomic identification errors generate misleading 45 ecological niche model predictions of an invasive hawkweed. Botany-Botanique, 91, 137–147. 46 47 Fordham, D.A., Akcakaya, H.R., Araujo, M.B., Elith, J., Keith, D.A., Pearson, R., et al. (2012). Plant 48 extinction risk under climate change: are forecast range shifts alone a good indicator of species 49 vulnerability to global warming? Glob. Chang. Biol., 18, 1357–1371. 50 51 Fordham, D.A., Mellin, C., Russell, B.D., Akcakaya, R.H., Bradshaw, C.J.A., Aiello-Lammens, M.E., et 52 al. (2013). Population dynamics can be more important than physiological limits for determining range 53 shifts under climate change. Glob. Chang. Biol., 19, 3224–3237. 54 55 Frey, S.J.K., Strong, A.M. & McFarland, K.P. (2012). The relative contribution of local habitat and 56 landscape context to metapopulation processes: a dynamic occupancy modeling approach. Ecography 57 58 (Cop.)., 35, 581–589. 59 60 Fukuda, S. & De Baets, B. (2012). Do absence data matter when modelling fish habitat preference using a genetic takagi-sugeno fuzzy model? Int. J. Uncertain. Fuzziness Knowledge-Based Syst., 20, 233–245.

Guillera-Arroita et al. - Global Ecology and Biogeography 3 Page 51 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S1 – Literature Review 1 2 3 4 Funk, A., Gschopf, C., Blaschke, A.P., Weigelhofer, G. & Reckendorfer, W. (2013). Ecological niche 5 models for the evaluation of management options in an urban floodplain-conservation vs. restoration 6 purposes. Environ. Sci. Policy, 34, 79–91. 7 8 Gavashelishvili, A. & Javakhishvili, Z. (2010). Combining radio-telemetry and random observations to 9 model the habitat of near threatened Caucasian grouse Tetrao mlokosiewiczi. Oryx, 44, 491–500. 10 11 12 Gonzalez, V.H., Koch, J.B. & Griswold, T. (2010). Anthidium vigintiduopunctatum Friese 13 (Hymenoptera: Megachilidae): the elusive “dwarf bee” of the Galapagos Archipelago? Biol. Invasions, 14 12, 2381–2383. 15 16 Gregr, E.J. & Trites, A.W. (2008). A novel presence-only validation technique for improved Steller sea 17 lion Eumetopias jubatus critical habitat descriptions. Mar. Ecol. Prog. Ser., 365, 247–261. 18 19 Hein, C.L., Ohlund, G. & Englund, G. (2012). Future distribution of Arctic char Salvelinus alpinus in 20 Sweden under climate change:For effects Peer of temperature, Review lake size and species interactions. Ambio, 41, 303– 21 312. 22 23 Higa, M., Tsuyama, I., Nakao, K., Nakazono, E., Matsui, T. & Tanaka, N. (2013). Influence of 24 nonclimatic factors on the habitat prediction of tree species and an assessment of the impact of climate 25 change. Landsc. Ecol. Eng., 9, 111–120. 26 27 28 Higgins, S.I., O’Hara, R.B., Bykova, O., Cramer, M.D., Chuine, I., Gerstner, E.M., et al. (2012). A 29 physiological analogy of the niche for projecting the potential distribution of plants. J. Biogeogr., 39, 30 2132–2145. 31 32 Hope, A.G., Waltari, E., Dokuchaev, N.E., Abramov, S., Dupal, T., Tsvetkova, A., et al. (2010). High- 33 latitude diversification within Eurasian least shrews and Alaska tiny shrews (Soricidae). J. Mammal., 91, 34 1041–1057. 35 36 Hope, A.G., Waltari, E., Fedorov, V.B., Goropashnaya, A. V, Talbot, S.L. & Cook, J.A. (2011). 37 Persistence and diversification of the Holarctic shrew, Sorex tundrensis (Family Soricidae), in response 38 to climate change. Mol. Ecol., 20, 4346–4370. 39 40 Ishihama, F., Takeda, T., Oguma, H. & Takenaka, A. (2010). Comparison of effects of spatial 41 autocorrelation on distribution predictions of four rare plant species in the Watarase wetland. Ecol. Res., 42 43 25, 1057–1069. 44 45 Isojunno, S., Matthiopoulos, J. & Evans, P.G.H. (2012). Harbour porpoise habitat preferences: robust 46 spatio-temporal inferences from opportunistic data. Mar. Ecol. Prog. Ser., 448, 155–U242. 47 48 Jarvis, A., Lane, A. & Hijmans, R.J. (2008). The effect of climate change on crop wild relatives. Agric. 49 Ecosyst. Environ., 126, 13–23. 50 51 Johnston, K.M., Freund, K.A. & Schmitz, O.J. (2012). Projected range shifting by montane mammals 52 under climate change: implications for Cascadia’s National Parks. Ecosphere, 3. 53 54 Johnston, M.W. & Purkis, S.J. (2012). Invasionsoft: A web-enabled tool for invasive species 55 colonization predictions. Aquat. Invasions, 7, 405–417. 56 57 Kattwinkel, M., Strauss, B., Biedermann, R. & Kleyer, M. (2009). Modelling multi-species response to 58 59 landscape dynamics: mosaic cycles support urban biodiversity. Landsc. Ecol., 24, 929–941. 60 Kelly, L.T., Dayman, R., Nimmo, D.G., Clarke, M.F. & Bennett, A.F. (2013). Spatial and temporal drivers of small mammal distributions in a semi-arid environment: The role of rainfall, vegetation and

Guillera-Arroita et al. - Global Ecology and Biogeography 4 Global Ecology and Biogeography Page 52 of 93

Matching SDMs to applications Appendix S1 – Literature Review 1 2 3 life-history. Austral Ecol., 38, 786–797. 4 5 Kery, M., Royle, J.A., Schmid, H., Schaub, M., Volet, B., Hafliger, G., et al. (2010). Site-occupancy 6 distribution modeling to correct population-trend estimates derived from opportunistic observations. 7 Conserv. Biol., 24, 1388–1397. 8 9 10 Khatchikian, C., Sangermano, F., Kendell, D. & Livdahl, T. (2011). Evaluation of species distribution 11 model algorithms for fine-scale container-breeding mosquito risk prediction. Med. Vet. Entomol., 25, 12 268–275. 13 14 Kissling, W.D., Fernandez, N. & Paruelo, J.M. (2009). Spatial risk assessment of livestock exposure to 15 pumas in Patagonia, Argentina. Ecography (Cop.)., 32, 807–817. 16 17 Konig, A., Janko, C., Barla-Szabo, B., Fahrenhold, D., Heibl, C., Perret, E., et al. (2012). Habitat model 18 for baiting foxes in suburban areas to counteract Echinococcus multilocularis. Wildl. Res., 39, 488–495. 19 20 La Manna, L., Matteucci,For S.D. & Kitzberger, Peer T. (2008). Review Abiotic factors related to the incidence of the 21 Austrocedrus chilensis disease syndrome at a landscape scale. For. Ecol. Manage., 256, 1087–1095. 22 23 Larson, E.R., Olden, J.D. & Usio, N. (2010). Decoupled conservatism of Grinnellian and Eltonian niches 24 in an invasive . Ecosphere, 1. 25 26 27 Leathwick, J., Moilanen, A., Francis, M., Elith, J., Taylor, P., Julian, K., et al. (2008). Novel methods for 28 the design and evaluation of marine protected areas in offshore waters. Conserv. Lett., 1, 91–102. 29 30 Leathwick, J.R., Snelder, T., Chadderton, W.L., Elith, J., Julian, K. & Ferrier, S. (2011). Use of 31 generalised dissimilarity modelling to improve the biological discrimination of river and stream 32 classifications. Freshw. Biol., 56, 21–38. 33 34 Lippitt, C.D., Rogan, J., Toledano, J., Saylgermano, F., Eastmana, J.R., Mastro, V., et al. (2008). 35 Incorporating anthropogenic variables into a species distribution model to map gypsy moth risk. Ecol. 36 Modell., 210, 339–350. 37 38 Machemer, E.G.P., Walter, J.F., Serafy, J.E. & Kerstetter, D.W. (2012). Importance of mangrove 39 shorelines for rainbow parrotfish Scarus guacamaia: habitat suitability modeling in a subtropical bay. 40 41 Aquat. Biol., 15, 87–98. 42 43 Malaney, J.L., Conroy, C.J., Moffitt, L.A., Spoonhunter, H.D., Patton, J.L. & Cook, J.A. (2013). 44 Phylogeography of the western jumping mouse (Zapus princeps) detects deep and persistent allopatry 45 with expansion. J. Mammal., 94, 1016–1029. 46 47 Manzu, C., Gherghel, I., Zamfirescu, S., Zamfirescu, O., Rosca, I. & Strugariu, A. (2013). Current and 48 future potential distribution of glacial relict Ligularia sibirica (Asteraceae) in Romania and temporal 49 contribution of Natura 2000 to protect the species in light of global change. Carpathian j. Earth environ. 50 Sci., 8, 77–87. 51 52 Marske, K.A., Leschen, R.A.B., Barker, G.M. & Buckley, T.R. (2009). Phylogeography and ecological 53 niche modelling implicate coastal refugia and trans-alpine dispersal of a New Zealand fungus beetle. 54 Mol. Ecol., 18, 5126–5142. 55 56 57 Martin, J., Revilla, E., Quenette, P.Y., Naves, J., Allaine, D. & Swenson, J.E. (2012). Brown bear habitat 58 suitability in the Pyrenees: transferability across sites and linking scales to make the most of scarce data. 59 J. Appl. Ecol., 49, 621–631. 60 Matthews, A., Spooner, P.G., Lunney, D., Green, K. & Klomp, N.I. (2010). The influences of snow cover, vegetation and topography on the upper range limit of common wombats Vombatus ursinus in the

Guillera-Arroita et al. - Global Ecology and Biogeography 5 Page 53 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S1 – Literature Review 1 2 3 subalpine zone, Australia. Divers. Distrib., 16, 277–287. 4 5 McCarren, K.L. & Scott, J.K. (2013). Host range and potential distribution of Aceria thalgi (Acari: 6 Eriophyidae): a biological control agent for Sonchus species. Aust. J. Entomol., 52, 393–402. 7 8 Measey, G.J., Rodder, D., Green, S.L., Kobayashi, R., Lillo, F., Lobos, G., et al. (2012). Ongoing 9 10 invasions of the African clawed frog, Xenopus laevis: a global review. Biol. Invasions, 14, 2255–2270. 11 12 Mocq, J., St-Hilaire, A. & Cunjak, R.A. (2013). Assessment of Atlantic salmon (Salmo salar) habitat 13 quality and its uncertainty using a multiple-expert fuzzy model applied to the Romaine River (Canada). 14 Ecol. Modell., 265, 14–25. 15 16 Morin, X., Viner, D. & Chuine, I. (2008). Tree species range shifts at a continental scale: new predictive 17 insights from a process-based model. J. Ecol., 96, 784–794. 18 19 Muhlfeld, C.C., Jones, L., Kotter, D., Miller, W.J., Geise, D., Tohtz, J., et al. (2012). Assessing the 20 impacts of river regulationFor on native Peerbull trout (Salvelinus Review confluentus) and westslope cutthroat trout 21 (Oncorhynchus clarkii lewisi) habitats in the upper flathead river, Montana, USA. River Res. Appl., 28, 22 940–959. 23 24 Mukherjee, A., Diaz, R., Thom, M., Overholt, W.A. & Cuda, J.P. (2012). Niche-based prediction of 25 26 establishment of biocontrol agents: an example with Gratiana boliviana and tropical soda apple. 27 Biocontrol Sci. Technol., 22, 447–461. 28 29 Nagaraju, S.K., Gudasalamani, R., Barve, N., Ghazoul, J., Narayanagowda, G.K. & Ramanan, U.S. 30 (2013). Do ecological niche model predictions reflect the adaptive landscape of species?: A test using 31 Myristica malabarica Lam., an endemic tree in the Western Ghats, India. PLoS One, 8. 32 33 Niamir, A., Skidmore, A.K., Toxopeus, A.G., Munoz, A.R. & Real, R. (2011). Finessing atlas data for 34 species distribution models. Divers. Distrib., 17, 1173–1185. 35 36 Osawa, T., Mitsuhashi, H., Uematsu, Y. & Ushimaru, A. (2011). Bagging GLM: Improved generalized 37 linear model for the analysis of zero-inflated data. Ecol. Inform., 6, 270–275. 38 39 Osborne, P.E. & Leitao, P.J. (2009). Effects of species and habitat positional errors on the performance 40 41 and interpretation of species distribution models. Divers. Distrib., 15, 671–681. 42 43 Owens, H.L., Campbell, L.P., Dornak, L.L., Saupe, E.E., Barve, N., Soberon, J., et al. (2013). 44 Constraints on interpretation of ecological niche models by limited environmental ranges on calibration 45 areas. Ecol. Modell., 263, 10–18. 46 47 Peters, J., De Baets, B., Van Doninck, J., Calvete, C., Lucientes, J., De Clercq, E.M., et al. (2011). 48 Absence reduction in entomological surveillance data to improve niche-based distribution models for 49 Culicoides imicola. Prev. Vet. Med., 100, 15–28. 50 51 Pinto, C.M., Marchan-Rivadeneira, M.R., Tapia, E.E., Carrera, J.P. & Baker, R.J. (2013). Distribution, 52 abundance and roosts of the fruit bat Artibeus fraterculus (Chiroptera: Phyllostomidae). Acta 53 Chiropterologica, 15, 85–94. 54 55 56 Pulgarin-R, P.C. & Burg, T.M. (2012). Genetic signals of demographic expansion in downy woodpecker 57 (Picoides pubescens) after the last North American glacial maximum. PLoS One, 7. 58 59 Qi, D.W., Zhang, S.N., Zhang, Z.J., Hu, Y.B., Yang, X.Y., Wang, H.J., et al. (2011). Different habitat 60 preferences of male and female giant pandas. J. Zool., 285, 205–214.

Qu, Y.H., Zhang, R.Y., Quan, Q., Song, G., Li, S.H. & Lei, F.M. (2012). Incomplete lineage sorting or

Guillera-Arroita et al. - Global Ecology and Biogeography 6 Global Ecology and Biogeography Page 54 of 93

Matching SDMs to applications Appendix S1 – Literature Review 1 2 3 secondary admixture: disentangling historical divergence from recent gene flow in the Vinous-throated 4 parrotbill (Paradoxornis webbianus). Mol. Ecol., 21, 6117–6133. 5 6 Rombouts, I., Beaugrand, G. & Dauvin, J.C. (2012). Potential changes in benthic macrofaunal 7 distributions from the English Channel simulated under climate change scenarios. Estuar. Coast. Shelf 8 Sci., 99, 153–161. 9 10 Rubio-Salcedo, M., Martinez, I., Carreno, F. & Escudero, A. (2013). Poor effectiveness of the Natura 11 12 2000 network protecting Mediterranean lichen species. J. Nat. Conserv., 21, 1–9. 13 14 Saupe, E.E., Barve, V., Myers, C.E., Soberon, J., Barve, N., Hensz, C.M., et al. (2012). Variation in 15 niche and distribution model performance: The need for a priori assessment of key causal factors. Ecol. 16 Modell., 237, 11–22. 17 18 Schuttler, E., Ibarra, J.T., Gruber, B., Rozzi, R. & Jax, K. (2010). Abundance and habitat preferences of 19 the southernmost population of mink: implications for managing a recent island invasion. Biodivers. 20 Conserv., 19, 725–743. For Peer Review 21 22 Seavy, N.E., Viers, J.H. & Wood, J.K. (2009). Riparian bird response to vegetation structure: a 23 multiscale analysis using LiDAR measurements of canopy height. Ecol. Appl., 19, 1848–1857. 24 25 Smith, K.M., Keeton, W.S., Donovan, T.M. & Mitchell, B. (2008). Stand-level forest structure and avian 26 27 habitat: Scale dependencies in predicting occurrence in a heterogeneous forest. For. Sci., 54, 36–46. 28 29 Smolik, M.G., Dullinger, S., Essl, F., Kleinbauer, I., Leitner, M., Peterseil, J., et al. (2010). Integrating 30 species distribution models and interacting particle systems to predict the spread of an invasive alien 31 plant. J. Biogeogr., 37, 411–422. 32 33 Stokland, J.N., Halvorsen, R. & Stoa, B. (2011). Species distribution modelling - Effect of design and 34 sample size of pseudo-absence observations. Ecol. Modell., 222, 1800–1809. 35 36 Swanson, A.K., Dobrowski, S.Z., Finley, A.O., Thorne, J.H. & Schwartz, M.K. (2013). Spatial 37 regression methods capture prediction uncertainty in species distribution model projections through time. 38 Glob. Ecol. Biogeogr., 22, 242–251. 39 40 Todisco, V., Gratton, P., Zakharov, E. V, Wheat, C.W., Sbordoni, V. & Sperling, F.A.H. (2012). 41 Mitochondrial phylogeography of the Holarctic Parnassius phoebus complex supports a recent refugial 42 43 model for alpine butterflies. J. Biogeogr., 39, 1058–1072. 44 45 Toth, J.P., Varga, K., Vegvari, Z. & Varga, Z. (2013). Distribution of the Eastern knapweed fritillary 46 ( ornata Christoph, 1893) (: ): past, present and future. J. 47 Conserv., 17, 245–255. 48 49 Trudeau, C., Imbeau, L., Drapeau, P. & Mazerolle, M.J. (2012). Winter site occupancy patterns of the 50 northern flying squirrel in boreal mixed-wood forests. Mamm. Biol., 77, 258–263. 51 52 Vacchiano, G., Barni, E., Lonati, M., Masante, D., Curtaz, A., Tutino, S., et al. (2013). Monitoring and 53 modeling the invasion of the fast spreading alien Senecio inaequidens DC. in an alpine region. Plant 54 Biosyst., 147, 1139–1147. 55 56 Vaclavik, T. & Meentemeyer, R.K. (2009). Invasive species distribution modeling (iSDM): Are absence 57 58 data and dispersal constraints needed to predict actual distributions? Ecol. Modell., 220, 3248–3258. 59 60 Valencia-Lopez, N., Malone, J.B., Carmona, C.G. & Velasquez, L.E. (2012). Climate-based risk models for Fasciola hepatica in Colombia. Geospat. Health, 6, S75–S85.

Guillera-Arroita et al. - Global Ecology and Biogeography 7 Page 55 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S1 – Literature Review 1 2 3 4 Vanhatalo, J., Veneranta, L. & Hudd, R. (2012). Species distribution modeling with Gaussian processes: 5 A case study with the youngest stages of sea spawning whitefish (Coregonus lavaretus L. s.l.) larvae. 6 Ecol. Modell., 228, 49–58. 7 8 Vincenzi, S., De Leo, G.A., Munari, C. & Mistri, M. (2014). Rapid estimation of potential yield for data- 9 poor Tapes philippinarum fisheries in North Adriatic coastal lagoons. Hydrobiologia, 724, 267–277. 10 11 12 Webber, B.L., Yates, C.J., Le Maitre, D.C., Scott, J.K., Kriticos, D.J., Ota, N., et al. (2011). Modelling 13 horses for novel climate courses: insights from projecting potential distributions of native and alien 14 Australian acacias with correlative and mechanistic models. Divers. Distrib., 17, 978–1000. 15 16 Wells, A.G., Wallin, D.O., Rice, C.G. & Chang, W.Y. (2011). GPS bias correction and habitat selection 17 by mountain goats. Remote Sens., 3, 435–459. 18 19 Wenger, S.J. & Olden, J.D. (2012). Assessing transferability of ecological models: an underappreciated 20 aspect of statistical validation.For Methods Peer Ecol. Evol. , Review3, 260–267. 21 22 Williams, C.J. & Heglund, P.J. (2009). A method for assigning species into groups based on generalized 23 Mahalanobis distance between habitat model coefficients. Environ. Ecol. Stat., 16, 495–513. 24 25 Wilting, A., Cord, A., Hearn, A.J., Hesse, D., Mohamed, A., Traeholdt, C., et al. (2010). Modelling the 26 species distribution of flat-headed cats (Prionailurus planiceps), an Endangered South-East Asian Small 27 28 Felid. PLoS One, 5. 29 30 Wisz, M., Dendoncker, N., Madsen, J., Rounsevell, M., Jespersen, M., Kuijken, E., et al. (2008a). 31 Modelling pink-footed goose (Anser brachyrhynchus) wintering distributions for the year 2050: potential 32 effects of land-use change in Europe. Divers. Distrib., 14, 721–731. 33 34 Wisz, M.S., Hijmans, R.J., Li, J., Peterson, A.T., Graham, C.H., Guisan, A., et al. (2008b). Effects of 35 sample size on the performance of species distribution models. Divers. Distrib., 14, 763–773. 36 37 Xu, Z.L., Zhao, C.Y., Feng, Z.D., Zhang, F., Sher, H., Wang, C., et al. (2013). Estimating realized and 38 potential carbon storage benefits from reforestation and afforestation under climate change: a case study 39 of the Qinghai spruce forests in the Qilian Mountains, northwestern China. Mitig. Adapt. Strateg. Glob. 40 Chang., 18, 1257–1268. 41 42 43 Yates, C.J., McNeill, A., Elith, J. & Midgley, G.F. (2010). Assessing the impacts of climate change and 44 land transformation on Banksia in the South West Australian Floristic Region. Divers. Distrib., 16, 187– 45 201. 46 47 Yun, J.H., Nakao, K., Tsuyama, I., Higa, M., Matsui, T., Park, C.H., et al. (2014). Does future climate 48 change facilitate expansion of evergreen broad-leaved tree species in the human-disturbed landscape of 49 the Korean Peninsula? Implication for monitoring design of the impact assessment. J. For. Res., 19, 174– 50 183. 51 52 Zhang, M.G., Zhou, Z.K., Chen, W.Y., Slik, J.W.F., Cannon, C.H. & Raes, N. (2012). Using species 53 distribution modeling to improve conservation and land use planning of Yunnan, China. Biol. Conserv., 54 153, 257–264. 55 56 Zhu, K., Woodall, C.W., Ghosh, S., Gelfand, A.E. & Clark, J.S. (2014). Dual impacts of climate change: 57 58 forest migration and turnover through life history. Glob. Chang. Biol., 20, 251–264. 59 60 Zohmann, M., Pennerstorfer, J. & Nopp-Mayr, U. (2013). Modelling habitat suitability for alpine rock ptarmigan (Lagopus muta helvetica) combining object-based classification of IKONOS imagery and Habitat Suitability Index modelling. Ecol. Modell., 254, 22–32.

Guillera-Arroita et al. - Global Ecology and Biogeography 8 Global Ecology and Biogeography Page 56 of 93

Matching SDMs to applications Appendix S1 – Literature Review 1 2 Table S1.1 – Summary of literature analysed in our review, including the reported purpose of the SDM, details regarding data type and method, whether a threshold 3 4 was applied to the final SDM, and if so, what type of threshold. Abbreviated data types correspond to: presence-absence (PA), presence-background (PB), presence- 5 only (PO), occupancy-detection (DET), information obtained from the literature or experts (Expert/Lit), abundance or biomass (Abund), or mechanistic data relating 6 to the physiological tolerances or dispersal capabilities of individuals (Mechanistic). The abbreviations used to indicate methods and thresholds are listed in Table 7 S1.2 (below). Papers marked with an asterisk (*) were not counted in the statistics relating to the review as they did not model occupancy using PA, PB, PO or DET 8 9 data. 10 No. of Data Threshold 11 Paper Purpose Data type Methods Threshold type 12 Methods collected applied 13 Abdel-Daymen et al (2012) Manage threateningFor processes PeerPB ReviewMaxEnt 1 Y Y quants 14 Altmoos & Henle (2010) Test SDM methodology PA GRM 1 Y N NA 15 Ball et al (2010) Threatened species management DET O-D 1 Y N NA 16 Banakar et al (2011) Manage threatening processes PA GRM 1 Y N NA 17 MaxEnt 18 Beans et al (2012) Invasion PB 1 N N NA 19 Beaugrand et al (2011) Develop SDM methodology PB NPPEN 1 N N NA Protected area/landscape 20 Bellamy et al (2013) PB MaxEnt 1 Y Y MTP, 10% 21 management 22 Benito et al (2011) Predict climate change impacts PB MaxEnt 1 N Y µ – σ of presences 23 MaxEnt, GARP, Benito et al (2014) Predict climate change impacts PO, PB 4 N Y fixed prob 24 ANN, SVM 25 Bentlage et al (2013) Develop SDM methodology PB MaxEnt, GARP 2 N Y MTP 26 27 Blair et al (2013) Phylogeography PB MaxEnt 1 N Y MTP GRM, ANN, BRT, 28 Bonthox & Balent (2012) Test SDM methodology PA 5 Y Y MTP 29 RF, MDA, MARS 30 Boulangeat et al (2012) Develop SDM methodology PA, Abund RF 1 Y N NA 31 Buckley (2010) Develop SDM methodology Mechanistic, PB FE, MaxEnt 2 Y N NA 32 MTP, 10%, s=s, max 33 Cao et al (2013) Test SDM methodology PB MaxEnt 1 N Y s+s, EETOD, 34 BTOPAT 35 iCAR, GRM, 36 Carroll et al (2010) Test SDM methodology PA 3 Y N NA 37 MaxEnt 38 Chen et al (2010)* Identify sites for production Effort Correlative HSI 1 Y N NA 39 Cianfrani et al (2010) Test SDM methodology PA, PB GRM, ENFA 2 Y Y µ & σ of P/E curves 40 Colacicco-Mayhugh et al Manage threatening processes PB MaxEnt 1 N Y MTP 41 (2010) 42 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 9 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 57 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S1 – Literature Review 1 2 No. of Data Threshold 3 Paper Purpose Data type Methods Threshold type Methods collected applied 4 GRM 5 Crimmins et al (2013) Predict climate change impacts PA 1 Y Y s=s 6 de Magny et al (2009) Manage threatening processes PA GRM 1 Y N NA 7 Delahaye et al (2010) Threatened species management PA GRM 1 Y Y max s+s 8 Dobrowski et al (2011) Predict climate change impacts PA GRM, BRT, RF 3 Y Y s=s 9 Doherty et al (2008) Threatened species management PB GRM 1 Y Y quants 10 Domisch et al (2011) Predict climate change impacts PB GRM, BRT, ANN 3 N Y s=s 11 12 Druon et al (2012)* Threatened species management Expert/Lit MCE 1 N N NA 13 Dunstan et al (2013) Develop SDMFor methodology PeerPA, Abund ReviewSAM 1 Y N NA 14 Ensing et al (2013) Test SDM methodology PB MaxEnt 1 N Y MTP, 10%, fixed prob 15 MaxEnt, BIOCLIM, 16 Fordham et al (2012) Develop SDM methodology PB MD, ED, GRM, RF, 8 N Y quants 17 GARP, ME 18 GRM, BRT 19 Fordham et al (2013) Develop SDM methodology PA, Abund 1 Y Y max k 20 Frey et al (2012) Threatened species management DET Dy-O-D 1 Y N NA Expert/Lit, 21 Fukuda & De Baets (2012)* Test SDM methodology FUZZ 1 Y N NA 22 Abund Protected area/landscape 23 Funk et al (2013) PA GRM 1 Y N NA 24 management Gavashelishvili & 25 Threatened species management PA GRM 1 Y Y s=s 26 Lukarevskiy (2008) 27 Gonzalez et al (2010) Invasion PB MaxEnt 1 N N NA 28 Gregr & Trites (2008)* Threatened species management Expert/Lit Rule 1 Y Y top prop 29 Hein et al (2012) Predict climate change impacts PA GRM 1 N Y max s+s 30 Higa et al (2013) Test SDM methodology PA GRM 1 Y Y sens = 95% 31 32 Higgins et al (2012) Develop SDM methodology PA GRM 1 Y Y max s+s 33 Hope et al (2010) Phylogeography PB MaxEnt 1 N Y s=s, MTP 34 Hope et al (2011) Phylogeography PB MaxEnt 1 N Y s=s, MTP 35 Ishihama et al (2010) Test SDM methodology PA GRM, iCAR 2 Y N NA 36 Isojunno et al (2012) Develop SDM methodology PA GRM 1 Y N NA 37 BIOCLIM 38 Jarvis et al (2008) Predict climate change impacts PO 1 N Y quants 39 Johnston & Purkis (2012) Develop SDM methodology Expert/Lit, PO CA 1 N N NA 40 Johnston et al (2012) Predict climate change impacts PA, PB GRM, MaxEnt 2 N Y fixed prob 41 Protected area/landscape Kattwinkel et al (2009) PA GRM 1 Y Y max k 42 management 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 10 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Global Ecology and Biogeography Page 58 of 93

Matching SDMs to applications Appendix S1 – Literature Review 1 2 No. of Data Threshold 3 Paper Purpose Data type Methods Threshold type Methods collected applied 4 GRM 5 Kelly et al (2013) Threatened species management PA 1 Y N NA 6 Kery et al (2010) Develop SDM methodology DET Dy-O-D 1 N N NA 7 GARP, MaxEnt, 8 Khatchikian et al (2011) Manage threatening processes PB, PO, PA BIOCLIM, Domain, 5 Y N NA 9 GRM 10 Kissling et al (2009) Human-wildlife conflict Expert/Lit, PB Rule, GRM 2 N Y quants 11 Konig et al (2012) Manage threatening processes PA GRM 1 Y N NA 12 13 La Manna (2008) Manage threateningFor processes PeerPA ReviewGRM 1 Y N NA 14 Larsen et al (2010) Invasion PB MaxEnt 1 Y Y MTP 15 Protected area/landscape Leathwick et al (2008) PA BRT 1 Y N NA 16 management 17 Protected area/landscape Leathwick et al (2011)* Abund GDM 1 N N NA 18 management 19 Lipitt et al (2008) Develop SDM methodology Expert/Lit, PA MCE, GRM, BPN 3 Y Y s=s 20 Machemer et al (2012) Threatened species management PA GRM 1 Y Y fixed prob 21 Malaney et al (2013) Phylogeography PB MaxEnt 1 N Y MTP 22 23 Manzu et al (2013) Predict climate change impacts PB MaxEnt 1 Y N NA 24 Marske et al (2009) Phylogeography PB MaxEnt 1 Y Y 10% 25 Martin et al (2012) Threatened species management PA, PB, PO GRM, ENFA, MD 3 Y Y Boyce index 26 Matthews et al (2010) Predict climate change impacts PA GRM 1 Y N NA 27 McCarren & Scott (2013) Biocontrol Mechanistic, PO CLIMEX 1 Y N NA 28 Measey et al (2012) Invasion PB MaxEnt 1 N Y MTP, 10% 29 Protected area/landscape Expert/Lit, 30 Mocq et al (2013)* FUZZ 1 Y N NA 31 management Abund 32 Morin et al (2008)* Predict climate change impacts Mechanistic Phenofit 1 Y Y bins 33 Muhlfeld et al (2012) Threatened species management PO 2D hist 1 Y N NA 34 Mukherjee et al (2012) Biocontrol PB MaxEnt 1 N Y MTP 35 MaxEnt, GARP, Nagaraju et al (2013) Identify sites for production PB, PO 3 Y Y fixed prob 36 BIOCLIM 37 Niamir et al (2011) Test SDM methodology PB MaxEnt 1 N N NA 38 39 Osawa et al (2011) Develop SDM methodology PA GRM 1 Y Y max s+s 40 Osborne & Leitao (2009) Test SDM methodology PA, PB GRM, MaxEnt 2 Y N NA 41 Owens et al (2013) Develop SDM methodology PA, PB GRM, MaxEnt, 3 Y N NA 42 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 11 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 59 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S1 – Literature Review 1 2 No. of Data Threshold 3 Paper Purpose Data type Methods Threshold type Methods collected applied 4 GARP 5 6 Peters et al (2011) Test SDM methodology PA GRM, LDA, RF 3 Y Y MTP 7 Pinto et al (2013) Threatened species management PB MaxEnt 1 Y Y MTP 8 Pulgarin-R & Burg (2012) Phylogeography PB MaxEnt 1 Y N NA 9 Qi et al (2011) Threatened species management PB ENFA 1 Y N NA 10 Qu et al (2012) Phylogeography PB MaxEnt 1 Y N NA 11 12 Rombouts et al (2012) Predict climate change impacts PB NPPEN 1 N N NA Protected area/landscape 13 Rubio-Salcedo et al (2013) For PeerPB ReviewENFA 1 N Y bins 14 management 15 GRM, GARP, 16 Saupe et al (2012) Test SDM methodology PA, PB, PO MaxEnt, BIOCLIM, 5 N Y MTP 17 Domain 18 Schuttler et al (2010) Invasion PA GRM 1 Y N NA 19 Protected area/landscape Seavy et al (2009) PA GRM 1 Y N NA 20 management 21 Smith et al (2008) Threatened species management DET O-D 1 Y N NA 22 23 Smolik et al (2010) Develop SDM methodology PA, Mechanistic GRM, IPS 2 N N NA 24 Stokland et al (2011) Test SDM methodology PB GRM, BRT 2 N N NA 25 Swanson et al (2013) Test SDM methodology PA GRM 1 Y Y s=s 26 Todisco et al (2012) Phylogeography PB MaxEnt 1 Y Y 10% 27 Toth et al (2013) Phylogeography PB MaxEnt 1 N Y 10% 28 Trudeau et al (2012) Threatened species management DET O-D 1 Y N NA 29 30 Vacchiano et al (2013) Invasion PB GRM 1 Y Y max s+s 31 Vaclavik & Meentemeyer et GRM, CT, MaxEnt, Test SDM methodology PA, PB 4 Y Y max s+s 32 al (2009) ENFA 33 Valencia-Lopez et al (2012) Manage threatening processes PB MaxEnt 1 Y N NA 34 Vanhatalo et al (2012) Develop SDM methodology PA GRM 1 Y N NA 35 Vincenzi et al (2014) Identify sites for production PA, Abund GRM 1 Y Y s=s 36 PB, Mechanistic, MaxEnt, BRT, 37 Webber et al (2011) Invasion 3 N Y MTP 38 PO CLIMEX 39 Wells et al (2011) Threatened species management PB RSPF 1 Y N NA 40 Wenger & Olden (2012) Test SDM methodology PA GRM, ANN, RF 3 Y N NA 41 Williams & Heglund (2009) Develop SDM methodology PA GRM 1 Y N NA 42 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 12 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Global Ecology and Biogeography Page 60 of 93

Matching SDMs to applications Appendix S1 – Literature Review 1 2 No. of Data Threshold 3 Paper Purpose Data type Methods Threshold type Methods collected applied 4 MaxEnt 5 Wilting et al (2010) Threatened species management PB 1 N Y 10% 6 Wisz et al (2008a) Threatened species management PA GRM 1 Y N NA 7 GRM, BRUTO, 8 BRT, MaxEnt, 9 GARP, OM-GARP, Wisz et al (2008b) Test SDM methodology PA, PB, PO 11 Y N NA 10 MARS, MARS-INT, 11 BIOCLIM, LIVES, 12 Domain 13 For Peer Review Protected area/landscape 14 Xu et al (2013) PB MaxEnt 1 Y Y max s+s 15 management 16 Yates et al (2010) Predict climate change impacts PB MaxEnt 1 Y Y max s+s 17 Yun et al (2014) Predict climate change impacts PA GRM 1 Y Y sens = 95% 18 Protected area/landscape Zhang et al (2012) PB MaxEnt 1 N Y 10%, max s+s, s=s 19 management 20 Bayesian joint Zhu et al (2014)* Predict climate change impacts Abund 1 Y N NA 21 response 22 Zohmann et al (2013)* Threatened species management Expert/Lit Rule 1 Y Y quants 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 13 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 61 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S1 – Literature Review 1 2 Table S1.2 Details of abbreviations used in Table S1.1 to indicate the methods and thresholds applied in 3 4 studies in the literature review. The number of studies which used each of the methods or thresholds is listed 5 in the far right column. 6 7 Type Acronym Detail Count 8 Threshold µ – σ of presences Mean minus the standard deviation of presences 1 Mean and standard deviation of the predicted to expected evaluation 9 Threshold µ & σ of P/E curves 1 10 point frequency curves 11 Threshold 10% 10th percentile of training presence 9 12 Method 2D hist Surface fit to two dimensional histogram 1 13 Method ANN Artificial Neural Networks 4 14 Method Bayesian joint response Bayesian joint response 1 15 Threshold bins SDM categorised by equally-spaced intervals 2 16 Method BIOCLIM Bioclimatic Envelope 6 17 Threshold Boyce Index Boyce Index 1 18 Method BPN Back-propagation neural network 1 Method BRT Boosted Regression Trees 8 19 Method BRUTO BRUTO 1 20 Threshold BTOPAT ForBalancing Peer training omission,Review predicted area and threshold value 1 21 Method CA Cellular automata 1 22 Method CLIMEX CLIMEX 2 23 Method Correlative HSI Correlative habitat suitability index 1 24 Method CT Classification Trees 1 25 Method Domain Domain 3 26 Method Dy-O-D Dynamic Occupancy-Detection 2 27 Method ED Euclidean distance 1 28 Threshold EETOD Equating entropy of thresholded and original distribution 1 29 Method ENFA Ecological Niche Factor Analysis 5 30 Method FE Foraging Energetic model 1 31 Threshold fixed prob A set, pre-determined probability 5 32 Method FUZZ Fuzzy habitat preference model 2 33 Method GARP Genetic Algorithm for Rule Set Production 8 34 Method GDM Generalised dissimilarity modelling 1 35 Method GRM Generalized regression models (GLM, GAM, GLMM, GAMM) 50 36 Method iCAR Bayesian intrinsic conditional autoregressive model 2 37 Method IPS Interacting Particle System 1 38 Method LDA Linear Discriminant Analysis 1 39 Method LIVES LIVES 1 40 Method MARS Multi-Adaptive Regression Splines 2 41 Method MARS-INT Multi-Adaptive Regression Splines with Interactions 1 42 Threshold max k Maximise kappa 2 43 Threshold max s+s Maximise sensitivity and specificity 10 44 Method MaxEnt MaxEnt 42 45 Method MCE Multi-Criteria Evaluation 2 46 Method MD Mahalanobis distance 1 47 Method MDA Mixture Discriminant Analysis 2 48 Method ME Maximum Entropy (not MaxEnt) 1 49 Threshold MTP Minimum training presence 17 50 Method NPPEN Non Parametric Probabilistic Ecological Niche 2 51 Method O-D Occupancy-Detection 3 52 Method OM-GARP Genetic Algorithm for Rule Set Production, minimising Omission 1 53 Method Phenofit Phenofit 1 54 Threshold quants Quantiles 6 55 Method RF Random Forest 6 56 Method RSPF Resource selection probability function 1 57 Method Rule Rule-based SDM 3 58 Threshold s=s Sensitivity equals specificity 11 59 Method SAM Species Archetype Model 1 60 Threshold sens = 95% Set threshold where sensitivity = 95% 2 Method SVM Support Vector Machines 1 Threshold top prop Top proportion of cells, to match critical habitat map 1

Guillera-Arroita et al. - Global Ecology and Biogeography 14 Global Ecology and Biogeography Page 62 of 93

Matching SDMs to applications Appendix S2 – Additional Figures 1 2 3 4 Appendix S2: Additional Figures 5 6 7 8 1) Lack of identifiability of species prevalence from presence-background (PB) data 9 10 To illustrate the lack of identifiability of species prevalence from PB data, we ran simulations where 11 the site occupancy probability ( ) of a species depends on a single covariate (‘cov’), simulated as 12 13 uniformly distributed . We considered four cases: 14 15  Cases 1&2: the simulated relationship exactly follows a logistic parametric form: 16 , where in case 1, and in case 2. 17  Cases 3&4: same relationships as in cases 1&2 but scaled by 0.5 (i.e. these simulated 18 19 relationships depart from a logistic function by a scaling factor). 20 For Peer Review 21 For each case we ran 100 simulations where we simulated the random sampling of 500 sites assuming 22 perfect detection. The resulting presence-absence (PA) datasets were analysed using GLM. We then 23 discarded the absence records and analysed the resulting PB datasets (taking 10000 sites as 24 25 background) using the method Maxlike (Royle et al. 2012), which was proposed as a means to 26 estimate occupancy probabilities from PB data. In both types of analysis we allowed for linear and 27 quadratic relationships. Figure S2.1 shows that Maxlike only estimates the (simulated) true 28 relationship without bias when it follows exactly a logistic parametric form (Fig S2.1, cases 1&2). 29 30 Small departures from the assumed parametric form can lead to substantially biased estimation in 31 Maxlike (Fig S2.1, cases 3&4), but does not have this consequence when working with PA data. Even 32 under the strong conditions where Maxlike could provide unbiased estimation, it is very imprecise, 33 which again is an indication of lack of identifiability of species prevalence. 34 35 References 36 37 Royle J.A., Chandler R.B., Yackulic C. & Nichols J.D. (2012). Likelihood analysis of species occurrence 38 39 probability from presence-only data for modelling species distributions. Methods in Ecology and Evolution, 3, 40 545-554. 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Guillera-Arroita et al. - Global Ecology and Biogeography 1 Page 63 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S2 – Additional Figures 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 For Peer Review 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 Figure S2.1: Lack of identifiability of species prevalence from presence-background (PB) data. Each row 47 represents one of the simulated scenarios described above. In cases 1&2, the “true” relationship between 48 occupancy and the covariate ‘cov’ (dashed line) follows exactly a logistic parametric form. In cases 3&4 the 49 “true” relationship departs from a logistic function by a scaling factor. Plots summarize the estimated 50 relationships obtained from 100 simulations: the solid line represents the median of the estimates, and the 51 th th 52 shaded region is delimited by the 5 and 95 percentiles. The estimation obtained from Maxlike is very 53 imprecise, and it is biased when there are departures from the assumed parametric form (Maxlike cannot 54 distinguish case 1 from case 3, or case 2 from case 4). This shows that, in practice, prevalence cannot be 55 estimated from PB data (i.e. PB data do not allow estimating actual occupancy probabilities). PA data allow 56 estimating prevalence; PA methods can deal reasonably well with these scenarios, even if there are departures 57 from the assumed parametric form. 58 59 60

Guillera-Arroita et al. - Global Ecology and Biogeography 2 Global Ecology and Biogeography Page 64 of 93

Matching SDMs to applications Appendix S2 – Additional Figures 1 2 3 2) True Presence-Absence maps and binary conversion of probabilistic occupancy maps 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 For Peer Review 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 Figure S2.2: “True” presence/absence map of a virtual species and maps obtained based on the occupancy 52 probabilities estimated by an SDM, with and without binary conversion (‘th’ denotes the probability level 53 chosen as threshold). Binary conversion of the probabilistic output does not recover the original 54 presence/absence map, as clearly shown in this example. Binary maps are less informative than the probabilistic 55 output. 56 57 58 59 60

Guillera-Arroita et al. - Global Ecology and Biogeography 3 Page 65 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S3 – Assessment of applications 1 2 3 4 Appendix S3: Assessment of ecological, biogeographical and conservation 5 applications 6 7 8 Summary 9 We considered a broad range of applications where species distribution models (SDMs) are used 10 across four domains: (i) management of invasive species, (ii) management of threatened species, (iii) 11 12 spatial planning, and (iv) ecological and biogeographical inference. We assessed the requirements of 13 these applications in terms of the information that SDMs should provide, evaluating whether 14 estimation of relative likelihoods of species occupancy or rankings would suffice or whether, 15 otherwise, they need estimation of occupancy probabilities. We also considered whether binary 16 conversion of SDM outputs have been used in each type of application (providing examples from the 17 published peer-reviewed literature where available) and described what the consequences of doing so 18 can be. 19 20 Our assessment is comprehensiveFor but Peer likely not be exhaustiveReview, as the applications of SDMs are 21 multiple and varied. Note also that we do not cover specifically the management of disease spread or 22 management of harvested species, but that applications in these domains will have parallelisms to 23 applications in the management of invasive species and threatened species respectively. Thus the 24 25 explanations given in the table can offer guidance within those domains although not directly treated 26 here. 27 28 29 Key and clarifications 30 31  “Columns 1-3” refer to Fig. 1a in the main text. 32  Refs: examples of applications. We indicate with an asterisk (*) references that are not themselves 33 concrete examples of applications, but where such examples are cited/listed or where an 34 application is proposed or discussed. 35  S/M: single or multiple species. This refers to whether the calculations involve SDMs from 36 several species, e.g. by combining or prioritizing them. We do not consider as multi-species those 37 applications where, although several species may be considered, calculations are independent for 38 39 each of them. 40  AOO = Area of Occupancy. 41  [-] = Example not found in the peer-reviewed literature. 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Guillera-Arroita et al. - Global Ecology and Biogeography 1 Global Ecology and Biogeography Page 66 of 93

Matching SDMs to applications Appendix S3 – Assessment of applications 1 2 3 Table S3.1: Examples of SDM usage in the management of invasive species 4 5 Consequences of biased Is relative likelihood or ranking Has binary 6 S/ Consequences of using Application SDM usage (input to…) Refs occupancy estimation a sufficient input? conversion 7 M a binary conversion 8 (Column 1) (Columns 2 & 3) been used? 9 Target Both suitable if only interested in A 'threshold' may be used Efforts to survey or control 10 management Rank sites according to habitat identifying best sites. Ranking not Yes to delimit best sites but [1] S invasion may be spatially 11 actions (e.g. suitability. sufficient if aim to identify sites within a [2] binary conversion leads to misguided. 12 surveys or control) certain suitability compared to best sites. loss of information. 13 For Peer Review Not valid. Method based 14 Input to optimization: find surveillance Overall costs will increase, Optimal Computing expected costs requires on acknowledging 15 level that minimizes overall costs as the level of surveillance surveillance [3] S probabilities; can result in suboptimal No uncertainty in species 16 (monitoring + management) given site guiding early management [see Case Study 3] solution otherwise. presence to evaluate trade- 17 occupancy probability. is not optimal. 18 offs in expected costs. 19 Invasion status may be Relative likelihoods & rankings not able Invasive species Degrades AOO estimation, 20 incorrectly assessed due to to estimate AOO, and not sufficient input monitoring Calculate species' AOO or compare even when probabilities are 21 [*4, biased estimation of AOO to track temporal trends (as not Yes or status temporal changes in AOO (by fitting S estimated; It does not solve 22 p.41] or its temporal trends (real comparable across time; except for PA [5] assessment models at different points in time). the inadequacy of relative 23 trend missed or spurious data when detectability constant in space [see Case Study 1] likelihoods or rankings. 24 trend detected). and time). 25 SDMs fitted with data from other areas Degrades AOO estimation, 26 Importation used to calculate potential AOO, for May result in beneficial even when probabilities are 27 decisions categorizing risk (e.g. low, medium, species being denied entry, Relative likelihoods and rankings are not Yes [6] S estimated; It does not solve 28 (pre-border risk high), sometimes within a scoring or harmful species being suitable for estimating AOO. [7,8] the inadequacy of relative 29 assessment) system that also considers non-spatial imported. likelihoods or rankings. 30 attributes (e.g. weed risk assessments). 31 SDMs fitted with data from other areas Relative likelihood is sufficient, as the 32 Post-border cost- Could lead to inappropriate used to calculate the proportion of proportion of AOO is not affected by a It degrades the estimation 33 sharing [9] S allocation of management [9] potential AOO within each constant scaling. Correct site ranking does of proportions of AOO. 34 arrangements costs across jurisdictions. 35 jurisdiction. not guarantee appropriate cost sharing. 36  We discuss the projection of a model to different scenarios at the bottom of the table (i.e. “Assessment of the impact of climate/land use change on invasive species”). 37 38 39 40 41 42 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 2 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 67 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S3 – Assessment of applications 1 2 3 Table S3.1 cont’d: Examples of SDM usage in the management of invasive species 4 5 Consequences of biased Is relative likelihood or ranking Has binary 6 S/ Consequences of using Application SDM usage (input to…) Refs occupancy estimation a sufficient input? conversion 7 M a binary conversion 8 (Column 1) (Columns 2 & 3) been used? 9 Post-border Degrades AOO estimation, 10 Management effort may be prioritization lists Calculate potential AOO for several Relative likelihoods and rankings are not even when probabilities are 11 allocated to species that Yes for control or species to prioritize management [10] M sufficient because they are not comparable estimated; It does not solve 12 actually have low potential [10] eradication resources across them. across species. the inadequacy of relative 13 for spread and vice versa. [see Case Study 2] For Peer Review likelihoods or rankings 14 15 (1) Using relative likelihood allows 16 delineating patches of distinct suitability, Input to spatially-explicit but can be problematic because the choice 17 (1) Needed to define metapopulation modelling methods of threshold is arbitrary (i.e. not based on 18 patches (but note that, if (e.g. RAMAS GIS) to: observed prevalence). Occupancy 19 suitability does not 1. Define the spatial extent of probabilities provide a more natural way 20 indicate sharp boundaries, suitable habitat (or 'patches'), by to determine the spatial structure of the (1) Yes. 21 Poor estimates of a patch-based Evaluate choosing a threshold below which metapopulation. Needed by 22 probability of eradication metapopulation model may management environments are considered [11] S definition. 23 and therefore choice of not be appropriate). options or 'unsuitable'. (2-3) Can be problematic (as can be the 24 management options. scenarios, in terms 2. Set the carrying capacity of use of occupancy probabilities) unless a (2-3) No 25 (2-3) Not valid, non- of the probability 'suitable' patches as a function of good model can be developed to link 26 thresholded values are of eradicating a suitability. relative likelihood to carrying capacity / 27 needed to distinguish the species over some 3. Set vital rates at 'suitable' patches vital rates (e.g. using other data). Ideally 28 suitability of patches. defined time as a function of suitability. one would directly model 'maximum 29 horizon abundance' or vital rates as a function of 30 environmental covariates. 31 Input to spatially explicit individual- Can be problematic (as can be the use of 32 Poor estimates of based, cellular automata or multi-agent occupancy probabilities) unless a good 33 probability of eradication Not valid, non-thresholded model, to determine the performance [12] S model can be developed to link them to No 34 and therefore choice of values are needed. of individuals in each cell in each time whichever rules govern the performance 35 management options. 36 step. of individuals within cells. 37 Relative likelihood suitable to evaluate Assessment of the Incorrect evaluation of Degrades AOO estimation, 38 SDM fitted with current data used to relative change but not to compute AOO impact of impacts of climate/land use even when probabilities are 39 make predictions under changing [13, and derived quantities such as species Yes climate/land use S change, as both current and estimated; It does not solve 40 environmental conditions (e.g. based 14] persistence. A ranking has the risk of [15,16] change on invasive projected species range are the inadequacy of relative 41 on projected scenarios). missing critical isoclines/thresholds and species incorrectly estimated. likelihoods or rankings 42 lead to incorrect impact predictions. 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 3 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Global Ecology and Biogeography Page 68 of 93

Matching SDMs to applications Appendix S3 – Assessment of applications 1 2 3 Table S3.2: Examples of SDM usage in the management of threatened species and communities 4 5 Consequences of biased Is relative likelihood or ranking Has binary 6 S/ Consequences of using Application SDM usage (input to…) Refs occupancy estimation a sufficient input? conversion 7 M a binary conversion 8 (Column 1) (Columns 2 & 3) been used? 9 Survey efforts may be 10 Both suitable if only interested in A 'threshold' may be used Locate misguided to suboptimal 11 Rank sites according to habitat identifying best sites. Ranking not enough Yes to delimit best sites but undiscovered [17] S habitats, reducing chances 12 suitability to direct surveys. if aim to identify sites within a certain [18] binary conversion leads to populations of encountering new 13 suitability compared to the best. loss of information. For Peerpopulations. Review 14 Both suitable if only interested in A 'threshold' may be used 15 Locate best sites Rank sites according to habitat Species may be translocated [*19, identifying best sites. Ranking not enough Yes to delimit best sites but 16 for conservation suitability to identify best candidate S to suboptimal/unsuitable 20] if aim to identify sites within a certain [21] binary conversion leads to 17 translocations areas. areas. 18 suitability compared to the best. loss of information. 19 Calculate current species' AOO, Relative likelihoods & rankings not able Degrades AOO estimation, 20 predict future AOO or estimate to estimate AOO, and not sufficient input even when probabilities are Trigger May unnecessarily trigger 21 temporal changes in AOO (by fitting to track temporal trends (as not estimated, and does not conservation [22] S translocation, or fail to [22]‡ 22 models at different points in time); a comparable across time; except for PA solve the inadequacy of translocations trigger it when needed. 23 given AOO or AOO decline triggers data when detectability constant in space relative likelihoods or 24 translocation. and time). rankings 25 Species conservation status Species Relative likelihoods & rankings not able Degrades AOO estimation, 26 may be incorrectly assessed monitoring / status to estimate AOO, and not sufficient input even when probabilities are 27 Calculate species' AOO or compare (e.g. wrong IUCN category) assessment (e.g. [*4, to track temporal trends (as not Yes estimated, and does not 28 temporal changes in AOO (by fitting S due to biased estimation of IUCN Red List p.41] comparable across time; except for PA [23] solve the inadequacy of 29 models at different points in time) AOO or its temporal trends categorization) data when detectability constant in space relative likelihoods or 30 (real trend missed or [see Case Study 1] and time). rankings 31 spurious trend detected). 32 Offsets are often based on metrics of Relative likelihoods may provide useful 33 vegetation condition, but SDMs can be guidance for selecting offset locations (i.e. Degrades AOO estimation, 34 useful to determine locations of high best sites, or sites comparable to those Chosen offset locations may even when probabilities are 35 suitability to comprise offsets or areas [*24, subject to development). However, Biodiversity S/ fail to compensate for the estimated, and does not 36 for restoration, especially in cases 25, absolute probabilities may also be ‡ offsets M biodiversity loss at the [26] solve the inadequacy of 37 where offsetting has a single-species 26] required, e.g. to compare expected AOO original site. relative likelihoods or 38 focus (e.g. SDM estimates used to across different areas, or to compare rankings 39 compute expected AOO within an expected AOO with an absolute target set 40 area). to compensate a given known loss. 41 42  We discuss the projection of a model to different scenarios at the bottom of the table (i.e. “Assessment of the impact of climate/land use change on threatened species”); ‡ Use directly a binary model 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 4 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 69 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S3 – Assessment of applications 1 2 3 Table S3.2 cont’d: Examples of SDM usage in the management of threatened species and communities 4 5 Consequences of biased Is relative likelihood or ranking Has binary 6 S/ Consequences of using Application SDM usage (input to…) Refs occupancy estimation a sufficient input? conversion 7 M a binary conversion 8 (Column 1) (Columns 2 & 3) been used? 9 (1) Using relative likelihood allows 10 delineating patches of distinct suitability, Input to spatially-explicit 11 but can be problematic because the choice (1) Needed to define metapopulation modeling methods 12 of threshold is arbitrary (i.e. not based on patches (but note that, if (e.g. RAMAS GIS) to: 13 For Peer Reviewobserved prevalence). Occupancy suitability does not (1) define the spatial extent of 14 probabilities provide a more natural way indicate sharp boundaries, Evaluate suitable habitat (or 'patches'), by Poor estimates of expected (1) Yes. 15 to determine the spatial structure of the a patch-based management choosing a threshold below which minimum abundance / Needed by 16 [27- metapopulation. metapopulation model may options or environments are considered S extinction risk and therefore definition. 17 30] not be appropriate). scenarios, in terms 'unsuitable' choice of management 18 (2-3) Can be problematic (as can be the of species (2) set the carrying capacity of options. (2-3) No 19 use of occupancy probabilities) unless a (2-3) Not valid, non- extinction risk or 'suitable' patches as a function of 20 good model can be developed to link them thresholded values are expected minimum suitability 21 to carrying capacity / vital rates (e.g. using needed to distinguish the abundance over (3) set vital rates at 'suitable' patches 22 other data). Ideally one would directly suitability of patches. some defined time as a function of suitability 23 model 'maximum abundance' or vital rates horizon 24 as a function of environmental covariates. 25 Input to spatially-explicit individual- Poor estimates of expected Can be problematic (as can be the use of 26 based, cellular automata or multi-agent minimum abundance / occupancy probabilities) unless a good Not valid, non-thresholded 27 model, to determine the performance [31] S extinction risk and therefore model can be developed to link them to No values are needed. 28 of individuals in each cell in each time choice of management whichever rules govern the performance 29 step options. of individuals within cells. 30 Assessment of the 31 impact of Relative likelihood suitable to evaluate Degrades AOO estimation, Incorrect evaluation of 32 climate/land use SDM fitted with current data used to relative change but not to compute AOO even when probabilities are impacts of climate/land use 33 change on make predictions under changing and derived quantities such as species Yes estimated, and does not [32] S change, as both current and 34 threatened species environmental conditions (e.g. based persistence. A ranking has the risk of [16,33,34] solve the inadequacy of projected species range are 35 (e.g. range shifts, on projected scenarios) missing critical isoclines/thresholds and relative likelihoods or incorrectly estimated 36 historical patterns, lead to incorrect impact predictions. rankings 37 refugia analysis) 38 39 40 41 42 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 5 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Global Ecology and Biogeography Page 70 of 93

Matching SDMs to applications Appendix S3 – Assessment of applications 1 2 3 Table S3.3: Examples of SDM usage in spatial planning 4 5 Consequences of biased Is relative likelihood or ranking Has binary 6 S/ Consequences of using Application SDM usage (input to…) Refs occupancy estimation a sufficient input? conversion 7 M a binary conversion 8 (Column 1) (Columns 2 & 3) been used? 9 Spatial 10 prioritization Can lead to an under- Relative likelihood is sufficient as 11 (reserve selection) performing prioritization. Zonation determines the value of sites in 12 Input layer to Zonation algorithm, for Top priorities may not terms of their relative contribution to the Resulting information loss [see Case Study 5] Yes 13 calculating the % of distribution that is [35] M represent the best possible distribution of species (% of distribution). may lead to an under- For Peer Review [36] 14 [Note: multiple lost for each cell removed. sites but those where the A ranking that does not capture the correct performing prioritization. 15 methods exist; we likelihood of observing shape can lead to an under-performing 16 discuss the two species was highest. prioritization. 17 main families of 18 objective functions Can lead to an under- 19 using two performing prioritization, Resulting information loss 20 Relative likelihood is fine if targets are set commonly used with some targets not met may lead to an under- in terms of the % of distribution to be 21 programs as and/or solution involving performing prioritization. Input layer to Marxan algorithm, for [37- retained, but not if set as absolute areas. A Yes 22 examples: 1) M greater cost than needed. Effect more pronounced evaluating constraint (target). 39] ranking that does not capture the correct [38] 23 maximum Prioritization will favor sites when targets are set as shape can lead to an under-performing 24 coverage/utility where the likelihood of absolute areas (as opposed prioritization. 25 (e.g. Zonation), 2) observing species is highest, to % of distributions). 26 minimum set (e.g. rather than the best sites. 27 Marxan)] 28 Can lead to misleading 29 assessment, incorrect Relative likelihood is sufficient if target 30 prioritization of areas and set as % of distribution covered by 31 Input to calculate the degree of overlap species for future protection reserves, as the proportion of AOO is not 32 Reserve of species distributions with protected and suboptimal spending Resulting information loss [40- affected by a constant scaling; not Yes 33 assessment and areas to evaluate whether targets (e.g. S (e.g. sampling is often may lead to an incorrect 42] sufficient if target expressed as absolute [40-42] 34 gap-analysis % of distribution or in/out of protected biased towards protected assessment. AOO contained in the reserve system. 35 areas) are met. areas, so reserve systems Correct site ranking does not guarantee 36 may appear to cover greater appropriate assessment. 37 proportions of distributions 38 than they actually do). 39 40 41 42 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 6 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 71 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S3 – Assessment of applications 1 2 3 Table S3.3 cont’d: Examples of SDM usage in spatial planning 4 5 Consequences of biased Is relative likelihood or ranking Has binary 6 S/ Consequences of using Application SDM usage (input to…) Refs occupancy estimation a sufficient input? conversion 7 M a binary conversion 8 (Column 1) (Columns 2 & 3) been used? 9 SDM estimates used in 10 movement/dispersal modeling e.g. to Corridors may be planned in Not valid, non-thresholded 11 determine 'path cost' in least-cost path suboptimal areas, esp. if Corridor/connectiv [43- Relative values are fine to identify the best values are needed to 12 analysis or 'resistance' in circuit S sampling bias leads to areas No ity planning 45] path. distinguish the suitability 13 theory-based modeling. (Note:For also Peeraround roads being rankedReview of sites. 14 sometimes used to define patches to be high in terms of suitability. 15 connected). 16 Identification of Conservation efforts may be Both suitable if only interested in A 'threshold' may be used 17 critical areas (e.g. misguided to suboptimal identifying best/worst sites. Ranking not Prioritize sites according to SDM [*46, Yes to delimit best sites but 18 to prioritize for S habitat or development enough if aim to identify sites within a suitability estimates. p.142 [47] binary conversion leads to 19 protection or allowed to proceed in good certain suitability compared to the 7] loss of information. 20 development) habitat for the species. best/worst. 21 Relative likelihood not sufficient to Identification of Areas where the likelihood Degrades richness 22 estimate absolute richness, but may lead to regional/global Stack single-species SDMs to estimate of observing species is estimation, even when 23 robust site ranking in some scenarios biodiversity species richness at sites, or to rank [*48, highest (rather than the best Yes probabilities are estimated, 24 M when comparing sites within the same hotspots for sites within a region in terms of their 49] habitats) may be incorrectly [*48,50-52] and does not solve the 25 modeled region (but estimation of prioritization species richness. identified as biodiversity inadequacy of relative 26 absolute richness is required for [see Case Study 4] hotspots. likelihoods or rankings. 27 comparing sites fitted in separate models). 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 7 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Global Ecology and Biogeography Page 72 of 93

Matching SDMs to applications Appendix S3 – Assessment of applications 1 2 3 Table S3.4: Examples of SDM usage in ecological and biogeographical inference 4 5 Consequences of biased Is relative likelihood or ranking Has binary 6 S/ Consequences of using Application SDM usage (input to…) Refs occupancy estimation a sufficient input? conversion 7 M a binary conversion 8 (Column 1) (Columns 2 & 3) been used? 9 Can lead to wrong 10 inferences regarding Depends on the objective. Relative Ecological inference drawn from the Not valid since the interest 11 Inference about environmental relationships likelihood is fine if interested only in environmental relationships estimated is in the shape of the 12 environmental [53] S (e.g. processes determining relative suitability, but not if the aim is to No by the SDM (less emphasis on its relationship, which is lost 13 relationships For Peerdetectability may beReview study the true relationship in probabilistic geographical projection) in this process. 14 mistaken as driving terms 15 occupancy) 16 1) Using binary 17 performance metrics An SDM calibrated in one area or 18 favors binary models timeframe is either: 19 1) Relative likelihood or correct ranking (i.e. binary conversion 1) used to predict a species' 20 may be sufficient if the aspect of of predictions), which distribution in another area, or to 21 May reduce predictive performance that is of interest is have reduced predict the distribution of a 1) Yes 22 performance and hence lead model discrimination, but not if it is discrimination and different but related species; [55] 23 Tests of niche [54, S/ to invalid inferences model calibration. calibration [see details predictive performance in the new 24 conservatism 55] M regarding niche in 56]. scenario is then evaluated. 2) Yes 25 conservatism/model 2) Relative likelihood is not sufficient 2) compared to an SDM calibrated in [54] 26 transferability. when the quantitative method used to 2) Degrades calculation another area, timeframe or of a 27 compare the two predicted of overlap, and does different but related species (e.g. 28 distributions expects probabilities. not solve the with some quantitative metric of 29 inadequacy of relative environmental overlap) 30 likelihoods or 31 rankings. 32 SDM for one or more species linked to 1) Depending on the method, relative 33 phylogenetic data to understand, test or likelihood may be sufficient for 34 generate hypotheses about: Can induce unrealistic single-species analyses. Degrades estimation 35 1) Biogeography (e.g. speciation or hypotheses or lead to wrong 1) Yes compared to working with 36 Inference about refugia): by e.g. visual inspection inferences regarding the [60,61] [57- S/ 2) Approaches across species are too probabilities, and does not 37 phylogenetic of SDMs or using SDMs to hypotheses studied (e.g. 60] M varied for one answer, but problems solve the inadequacy of 38 patterns simulate data across time. location of refugia, spatial 2) Yes estimating phylogenic diversity are relative likelihoods or 39 2) Community ecology: by patterns of speciation or [62] likely in many instances e.g. for the rankings. 40 calculating indices of site-level phylogenetic beta diversity). same reasons as in the estimation of 41 phylogenetic diversity based on species richness (see below). 42 SDM values. 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 8 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 73 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S3 – Assessment of applications 1 2 3 4 5 6 Table S3.4 cont’d: Examples of SDM usage in ecological and biogeographical inference 7 8 Consequences of biased Is relative likelihood or ranking Has binary S/ Consequences of using 9 Application SDM usage (input to…) Refs occupancy estimation a sufficient input? conversion M a binary conversion 10 (Column 1) (Columns 2 & 3) been used? 11 Relative likelihood not sufficient to 12 Degrades richness Stack single-species SDMs to estimate Areas where the likelihood estimate absolute richness, but may lead to 13 estimation, even when Inference about species richness at sites, or to Forrank Peerof observing species Review is robust site ranking in some scenarios 14 Yes probabilities are estimated, species richness sites within a region in terms of their [*48] M highest (rather than the best when comparing sites within the same 15 [*48,63,64] and does not solve the [see Case Study 4] species richness. Outcome used as habitats) may be incorrectly modeled region (but estimation of inadequacy of relative 16 input for further ecological analyses. identified as species-rich. absolute richness is required for likelihoods or rankings. 17 comparing sites fitted in separate models). 18 19 2 main types of SDM-based methods: (1) Resulting information 1) Include SDM predictions (or (1) (1) Relative likelihoods may be sufficient loss may lead to an 20 Inference about presence/absence records) from [65, to be used as predictors (but the incorrect assessment of 21 Can lead to incorrect (1) Yes. species co- one or several species as 66] magnitude of regression coefficients species co-occurrence. 22 inferences regarding species [70] occurrence (and covariates when fitting an SDM M would change with a scaling). 23 co-occurrence and potential for another species. (2) (2) Currently developed methods are (2) Would degrade 24 interactions (2) interactions) 2) Study residual correlation among [67- based on estimating occupancy estimation compared to 25 [-] 26 species occurrences, e.g. 69] probabilities. working with continuous 27 combining species in a joint SDM predictions 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 9 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Global Ecology and Biogeography Page 74 of 93

Matching SDMs to applications Appendix S3 – Assessment of applications 1 2 3 References 4 5 1. Jarnevich CS, Reynolds LV (2011) Challenges of predicting the potential distribution of a slow-spreading invader: a habitat suitability map for an invasive riparian tree. 6 7 Biological Invasions 13: 153-163. 8 2. Richter R, Dullinger S, Essl F, Leitner M, Vogl G (2013) How to account for habitat suitability in weed management programmes? Biological Invasions 15: 657-669. 9 3. Hauser CE, McCarthy MA (2009) Streamlining 'search and destroy': cost-effective surveillance for invasive species management. Ecology Letters 12: 683-692. 10 4. MacKenzie DI, Nichols JD, Royle JA, Pollock KH, Bailey LL, et al. (2006) Occupancy Estimation and Modeling: Inferring Patterns and Dynamics of Species 11 Occurrence. New York: Academic Press. 12 5. Urban MC, Phillips BL, Skelly DK, Shine R (2007) The cane toad's (Chaunus [Bufo] marinus) increasing ability to invade Australia is revealed by a dynamically 13 updated range model. Proceedings of theFor Royal Society PeerB-Biological Sciences Review 274: 1413-1419. 14 6. Pheloung PC (1995) Determining the weed potential of new plant introductions to Australia. Department of Agriculture, Perth, Australia. 15 7. Bomford M, Kraus F, Braysher M, Walter L, Brown L (2005) Risk assessment model for the import and keeping of exotic reptiles and amphibians. Bureau of Rural 16 Sciences for The Department of Environment and Heritage. 17 8. van Wilgen NJ, Roura-Pascual N, Richardson DM (2009) A Quantitative Climate-Match Score for Risk-Assessment Screening of Reptile and Amphibian Introductions. 18 Environmental Management 44: 590-607. 19 20 9. Governments CoA (2012) National Environmental Biosecurity Response Agreement 13/01/2012. 21 10. Lizzio J, Richmond L, Mewett O, Hennecke B, Baker J, et al. (2009) Methodology to prioritise Weeds of National Significance (WoNS) candidates. Bureau of Rural 22 Sciences, Canberra. 23 11. Fordham DA, Akcakaya HR, Araujo MB, Brook BW (2012) Modeling range shifts for invasive vertebrates in response to climate change. In: Brodie JF, Post ES, Doak 24 DF, editors. Wildlife Conservation in a Changing Climate. Chicago: University of Chicago Press. 25 12. Goslee SC, Peters DPC, Beck KG (2006) Spatial prediction of invasion success across heterogeneous landscapes using an individual-based model. Biological Invasions 26 8: 193-200. 27 13. Broennimann O, Guisan A (2008) Predicting current and future biological invasions: both native and invaded ranges matter. Biology Letters 4: 585-589. 28 14. Roura-Pascual N, Suarez AV, Gomez C, Pons P, Touyama Y, et al. (2004) Geographical potential of Argentine ants (Linepithema humile Mayr) in the face of global 29 climate change. Proceedings of the Royal Society B-Biological Sciences 271: 2527-2534. 30 15. Beerling DJ (1993) The Impact of Temperature on the Northern Distribution-Limits of the Introduced Species Fallopia-Japonica and Impatiens-Glandulifera in North- 31 West Europe. Journal of Biogeography 20: 45-53. 32 33 16. Nenzen HK, Araujo MB (2011) Choice of threshold alters projections of species range shifts under climate change. Ecological Modelling 222: 3346-3354. 34 17. Jarvis A, Williams K, Williams D, Guarino L, Caballero PJ, et al. (2005) Use of GIS for optimizing a collecting mission for a rare wild pepper (Capsicum flexuosum 35 Sendtn.) in Paraguay. Genetic Resources and Crop Evolution 52: 671-682. 36 18. Guisan A, Broennimann O, Engler R, Vust M, Yoccoz NG, et al. (2006) Using niche-based models to improve the sampling of rare species. Conservation Biology 20: 37 501-511. 38 19. Regan HM, Syphard AD, Franklin J, Swab RM, Markovchick L, et al. (2012) Evaluation of assisted colonization strategies under global change for a rare, fire- 39 dependent plant. Global Change Biology 18: 936-947. 40 20. Osborne PE, Seddon PJ (2012) Selecting suitable habitat for reintroductions: variation, change and the role of species distribution modelling. In: Ewen JG, Armstrong 41 DP, Parker KA, editors. Reintroduction Biology: Integrating Science and Management. Chichester, UK.: Wiley Blackwell. pp. 73–104. 42 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 10 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 75 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S3 – Assessment of applications 1 2 3 21. Willis SG, Hill JK, Thomas CD, Roy DB, Fox R, et al. (2009) Assisted colonization in a changing climate: a test-study using two UK butterflies. Conservation Letters 4 2: 45-51. 5 22. McLane SC, Aitken SN (2012) Whitebark pine (Pinus albicaulis) assisted migration potential: testing establishment north of the species range. Ecological Applications 6 22: 142-153. 7 8 23. Siders ZA, Westgate AJ, Johnston DW, Murison LD, Koopman HN (2013) Seasonal Variation in the Spatial Distribution of Basking Sharks (Cetorhinus maximus) in 9 the Lower Bay of Fundy, Canada. Plos One 8. 10 24. Bull JW, Suttle KB, Gordon A, Singh NJ, Milner-Gulland EJ (2013) Biodiversity offsets in theory and practice. Oryx 47: 369-380. 11 25. Gordon A, Langford WT, Todd JA, White MD, Mullerworth DW, et al. (2011) Assessing the impacts of biodiversity offset policies. Environmental Modelling & 12 Software 26: 1481-1488. 13 26. Kiesecker JM, Copeland H, Pocewicz A, NibbelinkFor N, Mckenney Peer B, et al. (2009) AReview Framework for Implementing Biodiversity Offsets: Selecting Sites and 14 Determining Scale. Bioscience 59: 77-84. 15 27. Wintle BA, Bekessy SA, Venier LA, Pearce JL, Chisholm RA (2005) Utility of dynamic-landscape metapopulation models for sustainable forest management. 16 Conservation Biology 19: 1930-1943. 17 28. Gordon A, Wintle BA, Bekessy SA, Pearce JL, Venier LA, et al. (2012) The use of dynamic landscape metapopulation models for forest management: a case study of 18 the red-backed salamander. Canadian Journal of Forest Research-Revue Canadienne De Recherche Forestiere 42: 1091-1106. 19 20 29. Bekessy SA, Wintle BA, Gordon A, Fox JC, Chisholm R, et al. (2009) Modelling human impacts on the Tasmanian wedge-tailed eagle (Aquila audax fleayi). 21 Biological Conservation 142: 2438-2448. 22 30. Akcakaya HR, Radeloff VC, Mlandenoff DJ, He HS (2004) Integrating landscape and metapopulation modeling approaches: Viability of the sharp-tailed grouse in a 23 dynamic landscape. Conservation Biology 18: 526-537. 24 31. DeAngelis DL, Shuter BJ, Ridgeway MS, Blanchfield P, Friesen T, et al. (1991) Modeling early life-history stages of smallmouth bass in Ontario lakes. Transaction of 25 the American Fisheries Society, 9-11. 26 32. Erasmus BFN, Van Jaarsveld AS, Chown SL, Kshatriya M, Wessels KJ (2002) Vulnerability of South African animal taxa to climate change. Global Change Biology 27 8: 679-693. 28 33. Wisz M, Dendoncker N, Madsen J, Rounsevell M, Jespersen M, et al. (2008) Modelling pink-footed goose (Anser brachyrhynchus) wintering distributions for the year 29 2050: potential effects of land-use change in Europe. Diversity and Distributions 14: 721-731. 30 34. Johnston KM, Freund KA, Schmitz OJ (2012) Projected range shifting by montane mammals under climate change: implications for Cascadia's National Parks. 31 Ecosphere 3. 32 33 35. Kremen C, Cameron A, Moilanen A, Phillips SJ, Thomas CD, et al. (2008) Aligning conservation priorities across taxa in Madagascar with high-resolution planning 34 tools. Science 320: 222-226. 35 36. Simaika JP, Samways MJ, Kipping J, Suhling F, Dijkstra KDB, et al. (2013) Continental-scale conservation prioritization of African dragonflies. Biological 36 Conservation 157: 245-254. 37 37. Carwardine J, Wilson KA, Ceballos G, Ehrlich PR, Naidoo R, et al. (2008) Cost-effective priorities for global mammal conservation. Proceedings of the National 38 Academy of Sciences of the United States of America 105: 11446–11450. 39 38. Zhang M-G, Zhou Z-K, Chen W-Y, Slik JWF, Cannon CH, et al. (2012) Using species distribution modeling to improve conservation and land use planning of 40 Yunnan, China. Biological Conservation 153: 257–264. 41 42 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 11 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Global Ecology and Biogeography Page 76 of 93

Matching SDMs to applications Appendix S3 – Assessment of applications 1 2 3 39. Carvalho SB, Brito JC, Crespo EG, Watts ME, Possingham HP (2011) Conservation planning under climate change: Toward accounting for uncertainty in predicted 4 species distributions to increase confidence in conservation investments in space and time. Biological Conservation 144: 2020–2030. 5 40. Maiorano L, Falcucci A, Boitani L (2006) Gap analysis of terrestrial vertebrates in Italy: Priorities for conservation planning in a human dominated landscape. 6 Biological Conservation 133: 455-473. 7 8 41. Hannah L, Midgley G, Andelman S, Araujo M, Hughes G, et al. (2007) Protected area needs in a changing climate. Frontiers in Ecology and the Environment 5: 131- 9 138. 10 42. Papes M, Gaubert P (2007) Modelling ecological niches from low numbers of occurrences: assessment of the conservation status of poorly known viverrids 11 (Mammalia, Carnivora) across two continents. Diversity and Distributions 13: 890-902. 12 43. Drielsma M, Manion G, Ferrier S (2007) The spatial links tool: Automated mapping of habitat linkages in variegated landscapes. Ecological Modelling 200: 403-411. 13 44. McRae BH, Dickson BG, Keitt TH, Shah VBFor (2008) Using circuitPeer theory to model Review connectivity in ecology, evolution, and conservation. Ecology 89: 2712-2724. 14 45. Beier P, Spencer W, Baldwin RF, McRae BH (2011) Toward Best Practices for Developing Regional Connectivity Maps. Conservation Biology 25: 879-892. 15 46. Guisan A, Tingley R, Baumgartner JB, Naujokaitis-Lewis I, Sutcliffe PR, et al. (2013) Predicting species distributions for conservation decisions. Ecology Letters 16: 16 1424-1435. 17 47. Brotons L, Manosa S, Estrada J (2004) Modelling the effects of irrigation schemes on the distribution of steppe birds in Mediterranean farmland. Biodiversity and 18 Conservation 13: 1039-1058. 19 20 48. Calabrese JM, Certain G, Kraan C, Dormann CF (2014) Stacking species distribution models and adjusting bias by linking them to macroecological models. Global 21 Ecology and Biogeography 23: 99-112. 22 49. Parviainen M, Marmion M, Luoto M, Thuiller W, Heikkinen RK (2009) Using summed individual species models and state-of-the-art modelling techniques to identify 23 threatened plant species hotspots. Biological Conservation 142: 2501-2509. 24 50. Loiselle BA, Howell CA, Graham CH, Goerck JM, Brooks T, et al. (2003) Avoiding pitfalls of using species distribution models in conservation planning. 25 Conservation Biology 17: 1591-1600. 26 51. Newbold T, Gilbert F, Zalat S, El-Gabbas A, Reader T (2009) Climate-based models of spatial patterns of species richness in Egypt's butterfly and mammal fauna. 27 Journal of Biogeography 36: 2085-2095. 28 52. Benito BM, Cayuela L, Albuquerque FS (2013) The impact of modelling choices in the predictive performance of richness maps derived from species-distribution 29 models: guidelines to build better diversity models. Methods in Ecology and Evolution 4: 327-335. 30 53. Austin MP, Nicholls AO, Margules CR (1990) Measurement of the Realized Qualitative Niche - Environmental Niches of 5 Eucalyptus Species. Ecological 31 Monographs 60: 161-177. 32 33 54. Warren DL, Glor RE, Turelli M (2008) Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. Evolution 62: 2868-2883. 34 55. Di Febbraro M, Lurz PWW, Genovesi P, Maiorano L, Girardello M, et al. (2013) The Use of Climatic Niches in Screening Procedures for Introduced Species to 35 Evaluate Risk of Spread: A Case with the American Eastern Grey Squirrel. Plos One 8. 36 56. Lawson CR, Hodgson JA, Wilson RJ, Richards SA (2014) Prevalence, thresholds and the performance of presence–absence models. Methods in Ecology and Evolution 37 5: 54-64. 38 57. Svenning JC, Flojgaard C, Marske KA, Nogues-Bravo D, Normand S (2011) Applications of species distribution modeling to paleobiology. Quaternary Science 39 Reviews 30: 2930-2947. 40 58. Richards CL, Carstens BC, Knowles LL (2007) Distribution modelling and statistical phylogeography: an integrative framework for generating and testing alternative 41 biogeographical hypotheses. Journal of Biogeography 34: 1833-1845. 42 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 12 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Page 77 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S3 – Assessment of applications 1 2 3 59. Knowles LL, Alvarado-Serrano DF (2010) Exploring the population genetic consequences of the colonization process with spatio-temporally explicit models: insights 4 from coupled ecological, demographic and genetic models in montane grasshoppers. Molecular Ecology 19: 3727-3745. 5 60. Forester BR, DeChaine EG, Bunn AG (2013) Integrating ensemble species distribution modelling and statistical phylogeography to inform projections of climate 6 change impacts on species distributions. Diversity and Distributions 19: 1480-1495. 7 8 61. Raxworthy CJ, Ingram CM, Rabibisoa N, Pearson RG (2007) Applications of ecological niche modeling for species delimitation: A review and empirical evaluation 9 using day geckos (Phelsuma) from Madagascar. Systematic Biology 56: 907-923. 10 62. Pellissier L, Espindola A, Pradervand JN, Dubuis A, Pottier J, et al. (2013) A probabilistic approach to niche-based community models for spatial forecasts of 11 assemblage properties and their uncertainties. Journal of Biogeography 40: 1939-1946. 12 63. Pineda E, Lobo JM (2009) Assessing the accuracy of species distribution models to predict amphibian species richness patterns. Journal of Animal Ecology 78: 182- 13 190. For Peer Review 14 64. Trotta-Moreu N, Lobo JM (2010) Deriving the Species Richness Distribution of Geotrupinae (Coleoptera: Scarabaeoidea) in Mexico From the Overlap of Individual 15 Model Predictions. Environmental Entomology 39: 42-49. 16 65. Meier ES, Kienast F, Pearman PB, Svenning JC, Thuiller W, et al. (2010) Biotic and abiotic variables show little redundancy in explaining tree species distributions. 17 Ecography 33: 1038-1048. 18 66. Leathwick JR, Austin MP (2001) Competitive interactions between tree species in New Zealand's old-growth indigenous forests. Ecology 82: 2560-2573. 19 20 67. Ovaskainen O, Hottola J, Siitonen J (2010) Modeling species co-occurrence by multivariate logistic regression generates new hypotheses on fungal interactions. 21 Ecology 91: 2514-2521. 22 68. Clark JS, Gelfand AE, Woodall CW, Zhu K (in press) More than the sum of the parts: Forest climate response from Joint Species Distribution Models. Ecological 23 Applications http://dx.doi.org/10.1890/13-1015.1. 24 69. Pollock LJ, Tingley R, Morris WK, Golding N, O’Hara RB, et al. (in press) Understanding co-occurrence by modelling species simultaneously with a Joint Species 25 Distribution Model (JSDM). Methods in Ecology and Evolution http://dx.doi.org/10.1111/2041-210X.12180. 26 70. de Araujo CB, Marcondes-Machado LO, Costa GC (2014) The importance of biotic interactions in species distribution models: a test of the Eltonian noise hypothesis 27 using parrots. Journal of Biogeography 41: 513-523. 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 Guillera-Arroita et al. - Global Ecology and Biogeography 13 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Global Ecology and Biogeography Page 78 of 93

Matching SDMs to applications Appendix S4 – Case studies 1 2 3 4 Appendix S4: Case studies 5 6 7 Case study 1 – Wildlife monitoring 8 9 This case study deals with the monitoring of temporal trends in species’ area of occupancy, where 10 separate models are fitted to each time step to capture changing relationships with the predictors 11 (rather than projecting a single model to changing environmental conditions). 12 13 Methods 14 15 Species and SDMs 16 We simulated a species over a real landscape (a rectangular area of 301 x 240 pixels covering 17 18 approximately 21km x 22km in the province of Teruel, Spain), assuming that its distribution is a 19 function of mean elevation (obtained from a Digital Elevation Model; Jarvis et al. 2008) and that it is 20 declining over time in thisFor environment Peer (e.g. due to Reviewchanging biotic interactions). We used the 21 following relationships at times 1, 2 and 3: 22 23 with 24 25 26 where is the probability of the species occurring at site and time , and is the standardized 27 elevation at the site. Based on these probabilities, we determined the sites occupied by the species as 28 the outcome of Bernoulli trials with probability . We then simulated the random sampling of 500 29 sites in each time step, assuming perfect detection (i.e. the species is detected if it is present), and 30 31 analysed the resulting data sets in two ways: 32 33 1) as presence/absence records (PA) using the R function glm (R Development Core Team 34 2011), allowing linear and quadratic terms; 35 2) as presence records and environmental background samples (PB) with Maxent (R package 36 dismo), allowing linear and quadratic features; with default parameter values (incl. tau = 0.5). 37 38 39 SDM output usage 40 We computed the sum of the SDM estimates (absolute or relative probabilities of occurrence) 41 obtained across the whole area at each time step (i.e. ∑ ̂ ), as well as the sum of the binary values 42 obtained after applying the following thresholds (Liu et al. 2005), which are the most commonly used 43 44 according to our literature review (Appendix S1): 45 46 1) ‘MTP’: minimum training presence (0% omission rate), 47 2) ‘10%TP’: 10% percentile of training presence (10% omission rate), 48 3) ‘SS=SP’: s siti ity = specificity, 49 4) ‘m x(SS+SP)’: max(sensitivity + specificity). 50 51 52 For PB data, specificity cannot be defined in the usual way given that absence records are not 53 available; we followed the practice in Maxent that uses the background data instead, and is therefore 54 related to predicted area. For details see Phillips et al (2006) and the Maxent tutorial at 55 http://www.cs.princeton.edu/~schapire/maxent/. The tutorial includes R code that details how Maxent 56 57 calculates false positive rates (i.e.,1-SP). 58 59 Note: Given PA data, the thresholds above lead to the same binary maps regardless of a fixed scaling 60 in SDM predictions. This property follows because these thresholds are defined based on percentiles (0% or 10%) or as a function of properties of the resulting binary SDM (sensitivity and specificity).

Guillera-Arroita et al. - Global Ecology and Biogeography 1 Page 79 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S4 – Case studies 1 2 3 Presentation of results 4 th th 5 We repeated the simulation 20 times, plotted the range delimited by the 10 and 90 percentiles of the 6 estimates calculated above and compared them with the area of occupancy (AOO) expected based on 7 the true simulated probabilities (i.e. ∑ ). 8 9 Additional results 10 11 For simplicity, in the main text (Fig. 2) we only present results for two of the binary (thresholded) 12 conversions. Fig S4.1 shows the results for the full set of thresholds assessed (note the change in scale 13 compared to Fig. 2). 14 15 16 17 References 18 Jarvis A., Reuter, H.I., Nelson, A., Guevara, E. (2008) Hole-filled seamless SRTM data V4, 19 20 International CentreFor for Tropical Peer Agriculture Review (CIAT) [Available online from: 21 http://srtm.csi.cgiar.org] 22 23 Liu C., Berry P.M., Dawson T.P. & Pearson R.G. (2005). Selecting thresholds of occurrence in the 24 prediction of species distributions. Ecography, 28, 385-393. 25 26 Phillips S.J., Anderson R.P., Schapire R.E. (2006) Maximum entropy modelling of species geographic 27 distributions. Ecological Modelling, 190, 231–259. 28

29 30 R Development Core Team (2011) R: A language and environment for statistical computing. R 31 Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/. 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 Figure S4.1 Estimated area of occupancy (AOO; %) for a virtual species as a function of time, based 55 56 on the output of PA and PB SDMs (left and right plot, respectively). Each bar summarizes the result th th 57 of 20 simulations (10 -90 percentiles), with colours representing how the SDM output was used (in 58 this order, black: no threshold, red: ‘MTP’, purple: ‘10%TP’, blue: ‘SS=SP’, green: ‘max(SS+SP)’). 59 Red horizontal lines denote the true AOO (based on the simulated distribution). 60

Guillera-Arroita et al. - Global Ecology and Biogeography 2 Global Ecology and Biogeography Page 80 of 93

Matching SDMs to applications Appendix S4 – Case studies 1 2 3 Case study 2 – Invasive species prioritization list 4 5 6 Methods 7 Species and SDMs 8 We simulated a set of 25 virtual species across Australia, assuming that their distributions are a 9 10 function of three climatic variables (“temperature seasonality”, “minimum temperature of the coldest 11 month” “precipitation of driest quarter”, corr spo i g to Wor c im ri b s bio4, bio6 12 bio17 downloaded from http://www.worldclim.org; Hijmans et al. 2005). We used a generalized 13 logistic model to simulate the environmental relationships of each species in two steps as follows: 14 15 16 1) ( ) 17 2) ∑ 18 19 where is the probability of species occurrence at site , , and are the values of the 20 For Peer Review 21 three climatic variables at site after standardization, the regression coefficients were randomly 22 drawn from a uniform distribution , and the scaling coefficient was randomly drawn from 23 . This procedure allowed us to easily model species with different overall prevalence. Based on 24 25 these probabilities of species occurrence, we determined whether a site was occupied by each of the 26 species as the outcome of Bernoulli trials with probability . We then simulated data collection and 27 analysed the resulting records as indicated in case study 1. 28 29 SDM output usage 30 31 For each species, we computed the sum of the SDM estimates across the whole landscape (i.e. 32 ∑ ̂ ), using the continuous output as well as the binary outputs obtained after applying the 33 thresholds detailed in case study 1. 34 35 Presentation of results 36 37 We plotted the resulting quantities with respect to the total area actually occupied by the species in 38 our simulations, and in this way assessed whether the SDMs would have led to the correct species 39 prioritization according to their area of occupancy (AOO). 40 41 42 Additional results 43 For simplicity, in the main text we only present results for two of the binary (thresholded) conversions 44 (Fig. 3). Fig S4.2 shows the results for the remaining thresholds assessed. 45 46 47 References 48 49 Hijmans, R.J., Cameron, S.E., Parra, J.L., Jones, P.G. and Jarvis, A. (2005) Very high resolution 50 interpolated climate surfaces for global land areas. International Journal of Climatology 25: 1965- 51 1978 52 53 54 55 56 57 58 59 60

Guillera-Arroita et al. - Global Ecology and Biogeography 3 Page 81 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S4 – Case studies 1 2 3 4 5 6 continuous output binary output binary output

7 MTP 10% TP

0 0 0

. . .

1 1

8 1

) ) )

% % %

( ( (

9

O O O

O O

10 O

A A A

5 5 5

. . .

d d d

0 0

PA 0

e e

11 e

t t t

a a a

m m

12 m

i i i

t t t

s s s

e e

13 e

0 0 0

. . .

0 0 14 0 15 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0

16 true AOO (%) true AOO (%) true AOO (%)

17

0 0 0

. .

18 .

1 1 1

) ) )

% %

19 %

( ( (

O O

20 O

O O

O For Peer Review

A A A

5 5 5

. . .

d d d

0 0

21 PB 0

e e e

t t t

a a

22 a

m m m

i i i

t t t

s s

23 s

e e e

0 0 0

. . .

0 0 24 0 25 0.0 0.5 1.0 0.0 0.5 1.0 0.0 0.5 1.0 26 true AOO (%) true AOO (%) true AOO (%) 27 28 29 Figure S4.2 Species prioritization list case study. Estimated area of occupancy (AOO; in %) for 25 30 virtual species as a function of their true AOO (in %), based on the output of PA and PB SDMs (first 31 32 and second row, respectively). In the first column the SDM output is used without a threshold. In the 33 second and third columns the SDM output is first converted to a binary output using a threshold: 34 minimum training presence (MTP) and 10th percentile of training presences (10% TP). The PB SDM 35 does not allow prioritizing species correctly. Binary conversion does not solve the problem, and is 36 37 detrimental for estimation based on PA data. 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Guillera-Arroita et al. - Global Ecology and Biogeography 4 Global Ecology and Biogeography Page 82 of 93

Matching SDMs to applications Appendix S4 – Case studies 1 2 3 Case study 3 – Optimal surveillance of invasive species 4 5 6 This case study deals with the consequences of having incorrect estimates of probability of occurrence 7 to inform the surveillance of an invasive species that is known to have arrived to an area and is 8 identified as damaging. The method considered (Hauser & McCarthy 2009) determines the best 9 allocation of resources at a given point of time, hence it does not explicitly account for the dynamic 10 11 nature of the invasion. Between seasons, managers have the opportunity to update occupancy models 12 with the new survey data, predict possible invasion spread and re-prioritize effort for the next survey. 13 The scenario considered is as follows: a site is surveyed and, if the species is detected, the site is 14 managed for its eradication; if the species is not detected, the site is not managed under the 15 16 assumption that the species is not present; however, the species could have been present and missed, 17 with the lack of early management leading to a wider infestation that ultimately requires higher 18 management costs. 19 20 Methods For Peer Review 21 22 Species and SDMs 23 We simulated the distribution of a virtual invasive species over a real landscape (as in case study 1), 24 25 assuming that its presence was a function of the standardized elevation as follows: 26 where is the probability of species occurrence at site . Based on these probabilities, 27 we determined which sites were occupied by the species as the outcome of Bernoulli trials with 28 probability . We then simulated data collection and analysed the resulting records as indicated in 29 30 case study 1 (i.e. as PA or PB data with GLM or Maxent, respectively). 31 32 SDM output usage 33 We used the output of the SDMs to determine the level of surveillance that minimizes expected total 34 35 costs following the detection and treatment model for invasive species proposed by Hauser & 36 McCarthy (2009), and assuming no budgetary constraints. Following Hauser & McCarthy (2009), we 37 used an exponential model for the detection of the species and assumed that surveys do not stop after 38 detection at a site (or, more generally, that the full survey costs allocated to a site are expended 39 40 regardless of whether the species is detected). Total expected costs are the sum of survey costs, 41 expected costs associated with detection and expected costs associated with missing the species 42 during surveys, that is 43 44 [ ] 45 46 47 where is the probability that the species is present at site , is the probability 48 of detecting the species if present at site in a survey of length , is the rate of detection per unit 49 length, are the costs associated with early management of the species and are the costs 50 51 associated with delayed management. Under this model, assuming no budgetary constraints, the 52 optimal level of surveillance that minimizes total expected costs is 53 54

55

56 { 57 ot r is 58 59 where . We computed the above expression for each site, taking the estimates 60 obtained from the SDMs as estimates for ̂ . For comparison, we also determined the optimal survey length based on the true occurrence probability values used to simulate the species. Finally, we

Guillera-Arroita et al. - Global Ecology and Biogeography 5 Page 83 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S4 – Case studies 1 2 3 computed the ratio between the total cost expected when surveys of the length determined above are 4 5 carried out and the cost expected when no surveys are carried out (i.e. [ ] ). In our 6 analysis, we assumed , 10 and 110 for all sites. 7 8 Presentation of results 9 We plotted the level of surveillance determined as optimal for each site under each scenario (true, PA 10 11 and PB), as well as the cost ratio computed as described above. 12 13 14 References 15 Hauser, C.E., McCarthy, M.A. (2009) Streamlining 'search and destroy': cost-effective surveillance 16 for invasive species management. Ecology Letters 12: 683-692. 17 18 19 20 For Peer Review 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Guillera-Arroita et al. - Global Ecology and Biogeography 6 Global Ecology and Biogeography Page 84 of 93

Matching SDMs to applications Appendix S4 – Case studies 1 2 3 Case study 4 – Modelling of biodiversity patterns: species richness 4 5 6 Methods 7 Species and SDMs 8 We simulated a set of 100 virtual species, assuming that their distributions are a function of a 9 10 covariate C as follows ( ) where is the probability of 11 occurrence of species at site , the regression coefficients are randomly drawn from a normal 12 ̅ 13 distribution (with mean and variance 0.5), and the scaling coefficient is randomly drawn from a 14 uniform distribution. We simulated four scenarios (Table 4.1) that represent different cases of 15 similarity among species, using two standardized uniformly distributed covariates ( and ). 16

17 18 Table S4.1 Scenarios simulated 19 Parameters Description 20 For ̅ Peer̅ Review 21 1) 50 species: , All species depend on the same ̅ ̅ covariate, with either a positive or 22 50 species: , 23 negative relationship (50% of each) 24 2) ̅ ̅ Half of the species depend on a 25 50 species: , ̅ ̅ covariate and half on a different 26 50 species: , 27 covariate ( ). 28 3) 50 species: , ̅ ̅ As scenario 1 but species that have a 29 ̅ ̅ positive relationship tend to have 30 50 species: , lower prevalence. 31 32 ̅ ̅ 4) 50 species: , As scenario 2 but species that depend 33 ̅ ̅ on the first covariate tend to have 50 species: , 34 lower prevalence. 35 36 37 Based on these probabilities of species occurrence, we determined the sites occupied by the species as 38 39 the outcome of Bernoulli trials with probability . We then simulated data collection and analysed 40 the resulting records as indicated in ‘case study 1’. In our simulations species richness values ranged 41 as follows: 5-25 species (case 1), 1-38 species (case 2), 5-33 species (case 3), 2-35 species (case 4). 42 43 SDM output usage 44 45 We stacked SDMs of individual species to estimate species richness (i.e. number of species present) at 46 each of 500 randomly chosen sites, following a “pr ict first, ss mb t r” str t gy (Ferrier & 47 Guisan 2006). We did this by adding the SDM estimates obtained for each site across all species (i.e. 48 ̂ ̂ 49 ∑ ), either using the continuous output or the binary values obtained after applying the 50 thresholds described in case study 1. 51 52 Presentation of results 53 We compared the estimated richness with the actual richness in each site in the simulation (i.e. 54 55 number of species present: ∑ , where indicate whether species is present at site ). We 56 used scatterplots, and the calculation of correlation and root mean square error (i.e., RMSE = 57 58 √∑ ( ̂ ) ). 59 60

Guillera-Arroita et al. - Global Ecology and Biogeography 7 Page 85 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S4 – Case studies 1 2 3 Additional results 4 5 For simplicity, in the main text we only present the results of simulating scenario 4 including two of 6 the binary conversions. Full results are summarized in Table S4.2. 7 8 9 References 10 11 Ferrier S. & Guisan A. (2006). Spatial modelling of biodiversity at the community level. Journal of 12 Applied Ecology, 43, 393-404. 13 14 15 Table S4.2 Performance of species richness estimation based on PA or PB stacked SDMs, with and 16 without binary conversion, for the four simulated scenarios detailed in Table S4.1. The first half of the 17 18 table (first 4 rows of body of table) presents the root mean square error (RMSE) in the estimation of 19 species richness across all sites. This indicates whether richness is accurately estimated. The second 20 half (rows 5 to 8) presentsFor the correlation Peer between actualReview and estimated richness (Sp rm ’s 21 correlation coefficient). This indicates whether sites are well ranked in terms of richness. For 22 23 r f r c , t t b i c u s t r su ts obt i if t ‘tru ’ prob bi iti s us to g r t t t 24 were used in the analysis. 25 26 27 true PA (logistic) PB (Maxent) 28 29 case prob cont T1 T2 T3 T4 cont T1 T2 T3 T4

30 1 3.49 3.48 78.3 55.5 27.6 29.7 27.6 78.6 56.7 27.1 29.4 31 32 2 3.50 3.50 76.4 56.0 33.3 35.1 25.6 76.8 56.8 30.4 34.3 RMSE 33 3 3.32 3.32 75.6 54.5 25.4 26.8 26.8 75.0 53.9 25.5 28.9 34 4 3.13 3.09 75.8 54.6 31.3 34.6 26.9 75.8 55.6 31.8 38.1 35 36 1 0.35 0.36 -0.33 -0.32 0.22 0.28 -0.10 -0.34 -0.29 0.23 0.23 37 2 0.88 0.88 0.62 0.77 0.86 0.87 0.86 0.60 0.74 0.85 0.86 38 corr 39 3 0.76 0.76 -0.60 -0.28 0.54 0.50 -0.73 -0.59 -0.37 0.03 -0.03

40 4 0.91 0.91 0.31 0.71 0.81 0.83 0.77 0.36 0.71 0.78 0.82 41 42 cont: continuous output of the SDMs used; otherwise, binary output used based on the following thresholds: 43 T1: ‘MTP’; T2: ‘10% TP’; T3: ‘SS=SP’; T4: ‘m x(SS+SP)’. 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Guillera-Arroita et al. - Global Ecology and Biogeography 8 Global Ecology and Biogeography Page 86 of 93

Matching SDMs to applications Appendix S4 – Case studies 1 2 3 Case study 5 – Spatial prioritization 4 5 6 For this case study we used Marxan v2.4 (Ball et al. 2009) and Zonation v.3.1 (Moilanen et al. 2012). 7 Both software packages are freely available and widely used in spatial prioritization. Marxan and 8 Zonation are based on different principles and algorithms; hence the results of the two programs are 9 not directly comparable. We present the results obtained for the Marxan analysis in the main body of 10 11 the paper. The results of the Zonation analysis are presented and discussed here in the appendix. 12 13 Methods 14 Species and SDMs 15 16 We used six sets of SDMs to explore the effect of SDM output scaling and binary conversion on 17 reserve selection. The SDMs represent seven charismatic species that occur in the Lower Hunter 18 19 region of New South Wales, Australia: koala (Phascolarctos cinereus), yellow bellied glider 20 (Petaurus australis), squirrelFor glider (PeerPetaurus norfolcensis Review), tiger quoll (Dasyurus maculatus), 21 powerful owl (Ninox strenua), sooty owl (Tyto tenbricosa) and the masked owl (Tyto 22 novaehollandiae) (Figure S4.3). Each of the species represents one conservation feature in our 23 planning problem. The first set of SDMs (hereafter r f rr to s ‘origi ’) s obt i b s o 24 25 PA data, as described in Wintle et al. (2005). 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Figure S4.3 The original set of SDMs for the seven species that were analysed using Marxan, and an example of a Marxan solution.

Guillera-Arroita et al. - Global Ecology and Biogeography 9 Page 87 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S4 – Case studies 1 2 3 T s co ‘sc ’ SDMs mimic PB m t o s t t pro uc stim t s of r ti ik i oo s of 4 5 species occupancy. To generate these, the probabilities in each of the original SDMs were scaled by 6 factors of 0.9, 0.8, 0.7, 0.6, 0.5, 0.4 and 0.3 respectively. We then applied the thresholds detailed in 7 case study 1 for binary conversion of the scaled SDMs to produce the remaining four sets of SDMs 8 (note that, as explained above in case study 1, the thresholds used lead to the same binary maps 9 10 regardless of a fixed scaling in SDM predictions; the thresholds themselves change, but the resulting 11 maps are the same). 12 13 We also created a dataset of simulated presence/absence records based on the original SDMs, which 14 r ssum to pr s t t ‘tru ’ istributio of c sp ci s. We used this dataset to assess the 15 16 performance of our prioritization results by evaluating how well the sp ci s ‘tru ’ istributions were 17 incorporated into solutions produced with the six different SDM datasets. We determined the value of 18 the thresholds using a random sample of those 500 presence/absence records (mimicking a sampled 19 dataset). 20 For Peer Review 21 22 SDM output usage - Marxan 23 Marxan addresses the minimum set problem, which is to design a reserve system that meets a set of 24 conservation targets whilst minimising the cost, and in some cases also the perimeter-to-core ratio 25 (boundaries) of the reserves (Ball et al. 2009). To do this, Marxan employs a heuristic algorithm 26 27 (simulated annealing) to optimise the objective function: 28

29 30 31 minimise ∑ ∑ ∑ 32 33 34 35 36 37 subject to 38 ∑ ≥ ∀ 39 40 41 42 where is the number of planning units in the solution, ci is the cost of conserving planning unit , 43 is the number of conservation features, is the amount of conservation feature in planning unit 44 , and is the target for feature . The control variable takes a value if unit is included in the 45 46 solution, and if it is excluded. The boundary length modifier (BLM), , influences the level of 47 compactness of the final reserve system, and is the connectivity matrix between units and . 48 49 To run Marxan, each of the 110,282 square cells across the Lower Hunter region covered by the 50 51 SDMs were given a unique identifier, and these formed the planning units from which Marxan could 52 design the reserve system. For the analysis, the amount of each conservation feature (species) 53 contained in each cell was set to the prediction provided by the corresponding SDM (i.e. ̂ ). 54 55 We set BLM to 1 and assigned the same cost to all cells ( ). 56 57 Two iff r t typ s of t rg ts r ss ss . For t first ‘proportio ’ t rg ts, M rx to sur 58 that a set proportion of the area that each species occupied was in the final reserve system. These 59 proportions differed between species, and were set at 0.4, 0.6, 0.35, 0.30, 0.70, 0.2 and 0.5 60 r sp cti y. For t s co typ of ‘ bso ut ’ t rg ts, Marxan was used to identify a reserve system within which each species had a fixed area of occupied habitat. We chose the absolute target by

Guillera-Arroita et al. - Global Ecology and Biogeography 10 Global Ecology and Biogeography Page 88 of 93

Matching SDMs to applications Appendix S4 – Case studies 1 2 3 multiplying the proportional targets listed above for each species by the total expected area they 4 ̂ 5 occupied in the original SDMs (i.e. ∑ ), which resulted in 14112, 17485, 12357, 10535, 15996, 6 7338, and 17370 cells, respectively. We analysed twelve scenarios in Marxan, with each of the six 7 sets of SDMs run against the two different types of target. For each scenario, Marxan produced 100 8 near-optimal solutions, and we selected the best solution (the one that had the lowest objective 9 10 function score out of the 100). 11 12 For each scenario we took the cells selected by Marxan for the reserve system, and plotted the target 13 that had been set for each species against the area that it occupied in the selected cells according to the 14 simulated presence/absence records. Using these plots, we assessed how the type of SDM used and/or 15 16 the type of target (proportional or absolute) influenced the degree to which Marxan over- or under- 17 shot the true target in the solution. We also calculated the total cost of each solution, equal to the 18 number of cells that Marxan placed in the reserve system, and compared this across scenarios. 19 20 SDM output usage - ZonationFor Peer Review 21 22 We ran a similar spatial prioritization analysis using the Zonation software (Moilanen et al. 2012). 23 Zonation addresses the maximum coverage/utility problem, where biodiversity benefits are 24 maximized within given budgetary or area constraints. Its prioritization algorithm works backwards: it 25 starts by assuming that all sites (or grid cells) in the landscape are protected, and progressively 26 27 removes sites that cause the smallest marginal loss in the conservation value of the remaining set of 28 sites. Hence the least valuable sites are removed first and the most valuable sites retained until the 29 very end; in doing so the removal process produces a ranking (or priority value) for each site. By 30 default Zonation does not use a priori defined targets: instead the priority ranking is used to identify 31 32 either i) the number of top priority sites that can be protected given area/budgetary constraints, or ii) 33 the set of sites needed to capture a given proportion of features distributions. In this exercise we used 34 the area constraint and assumed that 20% of the area of Lower Hunter can be protected. 35 36 For our analysis, we ran six prioritization scenarios, one for each of the SDM sets, and evaluated what 37 38 proportion of the true distribution of each species (simulated presence/absence records) were captured 39 by the top 20% priorities in each of the scenarios (Fig. S4.4). To calculate the value of a site during 40 the removal process, we used the ‘core-area’ definition of marginal loss , which is as follows: 41 42

43 ∑ 44 45 46 where is the occurrence level of feature (here species ) in cell , S is the remaining set of cells at 47 each point of the removal process and is the cost of keeping cell in the reserve network. As in the 48 Marxan analysis, the value within a cell for each species was set to the corresponding SDM prediction 49 50 (mo prob bi ity, r ti ik i oo , or “pr s c / bs c ”; ̂ ), and all cells were 51 assigned a uniform cost ( ). For simplicity, we did not include connectivity considerations to this 52 analysis although the program can account for both structural and functional connectivity. 53 54 55 Results: Zonation 56 57 Scaling SDMs (i.e. using relative likelihoods instead of probabilities) does not affect the planning 58 solution provided by Zonation (Fig. S4.4). This is because the program computes the value of cells in 59 terms of their relative contribution to the total amount of each feature (i.e. is proportional to 60 ∑ , and hence any scaling in cancels out). This result is expected to hold also for other maximum coverage/utility algorithms, as long as formulated in terms of the relative contribution cells.

Guillera-Arroita et al. - Global Ecology and Biogeography 11 Page 89 of 93 Global Ecology and Biogeography

Matching SDMs to applications Appendix S4 – Case studies 1 2 3 Scaling would however matter if the prioritization focuses on the (absolute) expected area of 4 5 occupancy captured by selected sites or if such quantities are otherwise used in post hoc analyses. 6 7 As with Marxan, binary conversion of SDM outputs can impact on the Zonation prioritization, 8 notably distorting t proportio of sp ci s ‘tru ’ r pr s t tio it i t s ct priority r s 9 (Fig. S4.4). The conversion leads to information loss, which ultimately may result in the spending of 10 11 too many resources for some species while under-representing others. 12 13 14 15 References 16 17 Ball, I.R., Possingham, H.P., & Watts, M.E. (2009) Marxan and relatives: Software for spatial 18 conservation prioritisation. Pages 185–196 in A. Moilanen, K.A. Wilson, and H.P. 19 Possingham, editors. Spatial Conservation Prioritization: Quantitative Methods and 20 For Peer Review 21 Computational Tools. Oxford University Press, New York City. 22 23 Moilanen, A., Meller, L., Leppänen, J., Montesino Pouzols, F., Arponen, A. & Kujala, H. (2012). 24 Zonation: Spatial conservation planning framework and software version 3.1 user manual. 25 Helsingin Yliopisto, Helsinki. 26 27 Wintle, B.A., Elith, J. & Potts J.M. (2005) Fauna habitat modelling and mapping: A review and case 28 29 study in the Lower Hunter Central Coast region of NSW. Austral Ecology 30:719–738. 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

49 50 51 Figure S4.4 Spatial prioritization (case study 5). Plots represent the proportion of ‘tru ’ occupied 52 area for each of the seven species captured by the top 20% priority areas in Zonation (y-axis). 53 Closed circles represent solutions produced for the original PA SDM probabilities, open circles 54 for scaled probabilities, and other symbols represent solutions where a threshold has been applied 55     56 to create binary SDMs: ‘MTP’ ( ), ‘10% TP’ ( ), ‘SS=SP’ ( ), ‘m x(SS+SP)’ ( ). 57 58 59 60

Guillera-Arroita et al. - Global Ecology and Biogeography 12