bioRxiv preprint doi: https://doi.org/10.1101/408096; this version posted September 4, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1 A systematic review of the relation between species traits and
2 extinction risk
3
4 Filipe Chichorro 1*, Aino Juslén2 and Pedro Cardoso1
5 1LiBRE – Laboratory for Integrative Biodiversity Research, Finnish Museum of Natural History, University
6 of Helsinki, PO Box 17 (Pohjoinen Rautatiekatu 13), 00014 Helsinki, Finland
7 2Finnish Museum of Natural History, University of Helsinki, PO Box 17 (Pohjoinen Rautatiekatu 13), 00014
8 Helsinki, Finland
9 *Address for correspondence (Tel: +358 469 508 202; E-mail: [email protected])
10
11 ABSTRACT
12
13 Biodiversity is shrinking rapidly, and despite our efforts only a small part of it has been assessed for
14 extinction risk. Identifying the traits that make species vulnerable might help us to predict the outcome for
15 those less known. We gathered information on the relations of traits to extinction risk from 173 publications,
16 across all taxa, spatial scales and biogeographical regions, in what we think is the most comprehensive
17 compilation to date. The Palaearctic is the most represented biogeographical realm in extinction risk studies,
18 but the other realms are also well sampled for all vertebrate groups. Other taxa are less well represented and
19 there are gaps for some biogeographical realms. Many traits have been suggested to be good predictors of
20 extinction risk. Among these, our meta-analyses were successful in identifying two as potentially useful in
21 assessing risk for the lesser-known species: regardless of the taxon, species with small range and habitat
22 breadth – two of the three classic dimensions of rarity – are more vulnerable to extinction. On the other hand,
23 body size (the most studied trait) did not present a consistently positive or negative response. Large species
1 bioRxiv preprint doi: https://doi.org/10.1101/408096; this version posted September 4, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
24 can be more, or less, threatened than their smaller counterparts, depending on the taxon and spatial setting. In
25 line with recent research, we hypothesize that the relationship between body size and extinction risk is
26 shaped by different aspects, namely body size is a proxy for different phenomena depending on the
27 taxonomic group, and there might be relevant interactions between body size and the specific threat that a
28 species faces. With implications across all traits, a particular focus in future studies should be given to the
29 interaction between traits and threats. Furthermore, intraspecific variability of traits should be part of future
30 studies whenever data is available. We also propose two complementary paths to better understand extinction
31 risk dynamics: (i) the simulation of virtual species and settings for theoretical insights, and (ii) the use of
32 comprehensive and comparable empirical data representative of all taxa and regions under a unified study.
33 Keywords: biological traits, functional diversity, IUCN, machine learning, meta-analysis, threat status.
34 Contents
35 ABSTRACT ...... 1
36 I. Introduction ...... 4
37 II. Material and Methods ...... 6
38 (1) Bibliography selection...... 6
39 (2) Statistical analysis ...... 8
40 (a) Data collection ...... 8
41 (b) Trait classes ...... 9
42 (c) Exploratory analysis ...... 10
43 (d) Meta-analysis ...... 10
44 III. Results ...... 11
45 (1) Bibliography selection...... 11
46 (2) Exploratory analysis ...... 12
47 (a) Studies ...... 12
2 bioRxiv preprint doi: https://doi.org/10.1101/408096; this version posted September 4, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
48 (b) Traits ...... 12
49 (3) Meta-analysis ...... 13
50 IV. Discussion ...... 14
51 (1) Relation between traits and extinction risk ...... 15
52 (2) Generalization...... 20
53 (3) Next steps ...... 21
54 V. Conclusions ...... 23
55 VI. Acknowledgements ...... 25
56 VII. References ...... 26
57 VIII. Supporting Information ...... 53
58 Figures ...... 54
59 Tables ...... 59
60
61
3 bioRxiv preprint doi: https://doi.org/10.1101/408096; this version posted September 4, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
62 I. Introduction
63
64 The last annual checklist of the Catalogue of Life reports 1.7 million (eukaryote) species described on this
65 planet (Roskov et al., 2017). Describing the remaining estimated 7 million might still take 1,200 years if the
66 rates of description, average number of new species described per taxonomist’s career, and average cost to
67 describe species remain constant (Mora et al., 2011). This latency in describing species is put into a typical
68 crisis context if, as predicted, 3,000 species fall into extinction every year (Wilson, 2003; González-Oreja,
69 2008), many of them undescribed.
70 The International Union for Conservation of Nature (IUCN) compiles and keeps updated a database with
71 assessments of risk of extinction for species. As of August 2018, 26 197 (28%) of all 93 577 species in this
72 list were either Critically Endangered, Endangered, or Vulnerable to extinction and 14 616 (16%) were Data
73 Deficient (IUCN, 2018). Yet, species in the IUCN database mostly comprise well-known taxa (e.g. 67% of
74 vertebrates have been assessed versus 0.8% of insects (IUCN, 2018)), and it will probably take decades until
75 a reasonable proportion of many taxa, such as most invertebrates, are assessed (Cardoso et al., 2011a, b,
76 2012). Increasing the number of species in the database to the point where we have an unbiased picture of
77 extinction risk across all organisms during the next years seems highly unlikely, as is the Barometer of Life
78 goal of assessing 160 000 species by 2020 (Stuart et al., 2010). Moreover, extinction is taxonomically
79 selective (e.g. 63% of cycads are assessed as threatened versus ‘only’ about 13% of bird species (IUCN,
80 2018)). The current proportions of endangered species might not represent the greater picture of species
81 diversity. Therefore, alternative ways of predicting the risk of extinction of species are urgently needed.
82 Understanding which biological/ecological traits of species make them more vulnerable could help us predict
83 their extinction risk and make species protection and conservation planning more efficient. This approach is
84 not new. Some comparative studies can be traced back to the 19th century (see McKinney, 1997, for a
85 thorough historical perspective), and since the beginning of the new millenium many new comparative
86 studies have arisen on the topic, as well as discussions over their usefulness (Fisher & Owens, 2004; Cardillo
87 & Meijaard, 2012; Murray et al., 2014; Verde Arregoitia, 2016). Many traits have been tested across
4 bioRxiv preprint doi: https://doi.org/10.1101/408096; this version posted September 4, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
88 hundreds of publications. Body size, for example, was found to be positively correlated with extinction risk
89 across multiple taxa (Seibold et al., 2015; Terzopoulou et al., 2015; Verde Arregoitia, 2016), either through
90 direct effects (e.g. larger species require more resources) or as a proxy for other traits (e.g. larger species
91 have slower life cycles and therefore respond more slowly to change). Range size and population density,
92 even after taking into account that they are often used to quantify extinction risk, have also been extensively
93 tested and found to be relevant, at least for mammals (Purvis et al., 2000a; González-Suárez, Gómez &
94 Revilla, 2013; Bland et al., 2015; Verde Arregoitia, 2016). Traits related to exposure to human pressures
95 have also been relevant in predicting threats to species (Cardillo, 2003), and recently Murray et al. (2014)
96 have called for more studies explicitly incorporating threats and the interplay between traits and threats into
97 the analyses. The inclusion of threat information in predicting extinction risk has indeed proved to increase
98 the explanatory power of models (Murray et al., 2014), and in some cases the same trait can bolster
99 extinction risk or prevent it, depending on the threat (González-Suárez et al., 2013).
100 Most of the studies to date have focused on mammals (e.g. Purvis et al., 2000a; Cardillo et al., 2008;
101 González-Suárez et al., 2013) and other vertebrates (e.g. Owens & Bennett, 2000; Luiz et al., 2016), with
102 relatively few on plants (e.g. Sodhi et al., 2008; Powney et al., 2014; Stefanaki et al., 2015) and invertebrates
103 (e.g. Sullivan et al., 2000; Koh, Sodhi & Brook, 2004; Arbetman et al., 2017). Each study was made on
104 different spatial settings and scales, testing different traits (often according to availability of data), and
105 employed different methods and response variables. While this is necessary and valuable information,
106 making sense of the plethora of contrasting results is difficult, and perceiving general trends and trying to
107 cover current gaps and bias are urgent. In this work we attempt to answer the following questions through a
108 comprehensive bibliography search, data exploration and meta-analysis:
109 ● Which traits have been studied more often?
110 ● Which traits have been suggested as predictors of extinction risk?
111 ● How generalizable are the past results?
112 ● What directions should research take in the future to overcome the shortage and bias of data?
5 bioRxiv preprint doi: https://doi.org/10.1101/408096; this version posted September 4, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
113 II. Material and Methods
114
115 In this review we undertook two sequential analyses of studies that examined the relation between traits of
116 species and their estimated extinction risk: first, an exploratory analysis of the literature, to know which traits
117 have been more studied and which were found to be most relevant, and second, multiple meta-analyses, in
118 which we tried to use comparable data from a subset of the studies to understand and quantify trends across
119 studies and taxa from all published data and to see whether any general conclusions can be made from
120 existing literature.
121 (1) Bibliography selection
122 We were aiming to retrieve a list of all publications that have explicitly performed comparative studies of
123 biological/ecological traits and extinction risk/decline of species and to identify which traits, extrinsic factors
124 and taxa were used in each analysis and at which spatial scale. We searched in the Web of Science using the
125 keywords ‘trait*’ and ‘extinct*’ in October 2016 and obtained a list of 2,616 manuscripts. To consider a
126 given paper as relevant to our study, all of the following conditions had to be met:
127 ● more than five species were involved in the study;
128 ● for each species there was information on at least one biological trait;
129 ● for each species there was some kind of quantification of its extinction risk;
130 ● there was a statistical model linking the species traits (explanatory variables) to the extinction risk
131 (response variables), assigning scores to each trait involved (not necessarily significance).
132
133 We considered as extinction risk measurements any of the following:
134 ● recent (anthropogenic) extinctions versus extant species;
135 ● any variable (continuous, ordinal, categorical or binary) directly indicating relative extinction risk,
136 whether it was based on the IUCN Red List categories or not;
137 ● population trend data, or a proxy of population trend data, in time;
138 ● any other variables that indicated decline of species over time and/or risk of extinction.
6 bioRxiv preprint doi: https://doi.org/10.1101/408096; this version posted September 4, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
139
140 On a first reading of a small part of the studies, we realized that most were not relevant to the study, as either
141 traits or extinction were marginal to our questions (some were not even related to biology) or some of the
142 minimum requirements were missing. Analysing 2,616 manuscripts would have been too time consuming
143 when a relatively small fraction would be relevant. For this reason, we used an automated process to sort out
144 relevant publications.
145 We used a random forest algorithm trained to identify relevant from non-relevant publications. We first
146 created two sets of 35 abstracts+titles, one with relevant and one with non-relevant papers (70 in total). As
147 words with common roots (e.g. extinction and extinct) should be quantified together, we used the R software
148 (R Core Team, 2018) and package SnowballC (Bouchet-Valat, 2014) to stem these words (e.g. all words in
149 the ‘extinct’ family fall into the same stem). We also excluded hyphenation, punctuation and stopwords
150 (Rajamaran & Ullman, 2011) using R packages SnowballC and tm (Feinerer & Hornik, 2008). Then, all the
151 words and corresponding word counts of each abstract were used as variables in training the random forest
152 model.
153 Decision trees were built with packages rpart (Therneau & Atkinson, 2015) and rpart.plot (Milborrow,
154 2018). The training of the random forest algorithm was run with 50 000 trees and 45 variables tried at each
155 step (the suggested number of variables is the square root of the number of words, Liaw & Wiener, 2002)
156 with R package randomForest (Liaw & Wiener, 2002). To validate the model, we assembled another set of
157 abstracts (n = 76, test set) and calculated the sensitivity, specificity and the true skill statistic (TSS) of the
158 model based on predictions on the test set. After training and validating the algorithm, we used it to classify
159 the remaining 2,000+ abstracts. In order to further test for false negatives, i.e. papers incorrectly labelled as
160 non-relevant leading to the loss of important studies, we randomly picked 100 abstracts labelled as non-
161 relevant and checked them.
162 To update the full set of relevant papers, we performed another search using the same keywords in June
163 2018, retrieving new publications and obtaining 682 further studies from the Web of Knowledge. The trained
164 algorithm was again used to filter potentially relevant publications from these. We finally added to the
165 relevant set all papers of which we had prior knowledge by other means.
7 bioRxiv preprint doi: https://doi.org/10.1101/408096; this version posted September 4, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
166
167 (2) Statistical analysis
168 (a) Data collection
169 We assembled information on each comparative statistical test employed in each article. For each of these
170 tests, we extracted information on the following (see also Appendix S1, S2):
171 ● Taxonomic group: mammals, birds, reptiles, amphibians, fishes, insects, molluscs, other
172 invertebrates, plants and fungi.
173 ● Geographical realm: Afrotropics, Antarctic, Australasia, Indo-Malaya, Nearctic, Neotropics,
174 Oceania and Palaearctic (Olson et al., 2001).
175 ● Traits: continuous, ordinal, categorical or binary – units, the number of observations (usually
176 species), and whether there was a significant response to extinction risk for that test. We grouped
177 traits with similar biological meaning into the same unified trait (e.g. body length and body mass
178 into ‘body size’). Henceforth, trait refers to these unified traits. Finally, to identify meaningful
179 groups of traits and facilitate analysis, we clustered these traits into meaningful classes (see next
180 section).
181 ● Statistical methodology: generalized linear models and linear models (including parametric tests of
182 the generalized linear models’ and linear models’ families, Pearson correlations, G-tests, analysis of
183 variance and analysis of covariance, chi-square statistics), generalized linear mixed models
184 (GLMMs) and linear mixed models (parametric tests of the generalized mixed linear models’ and
185 linear mixed models’ families), decision tree based (including random forests, boosted or non-
186 boosted classification and regression trees), generalized estimating equations, phylogenetic
187 comparative methods (PGCMs; which include phylogenetic independent contrasts and phylogenetic
188 generalized least squares), non-parametric (Kruskal–Wallis, Spearman rank correlations, etc.), and
189 other (including discriminant analysis, path analysis, randomization tests, skewness tests, structural
190 equation modelling).
8 bioRxiv preprint doi: https://doi.org/10.1101/408096; this version posted September 4, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
191 ● Phylogenetic control: we distinguished absolute no control of phylogeny from at least some control
192 of phylogeny (via using phylogenetic trees or a taxonomical higher group as a controlling or
193 covariable in the analyses).
194 ● Proxy of extinction used: IUCN based (where the response variable – ordinal or binary – is based
195 upon IUCN assessments of species), other red lists (based on non-IUCN red lists), temporal trends
196 (in which the trend can be the difference between population numbers, or geographical range size,
197 between two time periods, or the slope of the variation in the population numbers over time), and
198 everything else as others (recently extinct species versus extant species, extirpated versus non-
199 extirpated, etc.).
200 (b) Trait classes
201 We named nine natural groups (or categories) of traits (Table 1). All these categories describe a property of a
202 focal species, but they differ in the general type of such property:
203 ● Individual: traits measurable at an individual level: morphological traits, traits related to
204 reproductive output and life-cycle speed.
205 ● Population: traits that describe a property of a whole population. Such properties can relate to how
206 individuals of the populations of the focal species aggregate, whether they are social, or other traits
207 only measurable at the population level, like population density or abundance.
208 ● Community: in this category, traits describe the interactions between the focal species and other
209 species in the community, whether this interaction is trophic or other, such as symbiosis,
210 commensalism or pollination.
211 ● Evolution: including traits related to how successful a species might be in adapting to change
212 through evolution, as well as any other traits related to the evolutionary history of the species.
213 ● Space: traits describing the current or past geographical distribution in space of the species or its
214 ability to disperse.
215 ● Time: traits that describe how species use time at different scales, such as daily or seasonal cycles.
216 ● Environmental niche: traits mainly describing species interactions with their biotic or abiotic niche,
217 such as habitat affinities and preferential climatic ranges.
9 bioRxiv preprint doi: https://doi.org/10.1101/408096; this version posted September 4, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
218 ● Human influence: traits that describe the interplay between species and humans, such as habitat loss
219 due to anthropogenic causes and hunting or other uses of the species.
220 ● Taxonomy: taxonomic affinity or hierarchy of the species.
221 ● Other: traits that do not fit into any of the previous groups.
222 (c) Exploratory analysis
223 In the exploratory analysis we first compared the number of studies across taxa and combinations of taxa
224 across biogeographical realms, proxy of extinction used, statistical methodology, and phylogeny. Next, we
225 compared the number of studies in which each trait was used, and calculated the percentage of significant
226 measurements of each trait. Statistical tests which did not assign significance levels to traits had to be
227 excluded from this step (e.g. most decision tree methodologies).
228 (d) Meta-analysis
229 To understand whether traits were positively or negatively related to extinction risk across the multiple
230 studies, we performed meta-analyses for each continuous trait. Meta-analyses are useful because they allow
231 the comparison of outcomes from different studies by converting the outcomes to effect sizes. The use of
232 Fisher’s Z as the effect size has the advantage of allowing very diverse statistical methodologies into the
233 same effect size measurement. Effect sizes were obtained by transforming the statistics reported in the
234 manuscripts (F, z, Χ2, t or r2) into Pearson’s product-moment correlation coefficients (r) by applying
235 equations (1) to (5) (Rosenthal, 1991) and then transforming r into Fisher’s Z using equation (6) using R
236 package metafor (Viechtbauer, 2010):
237 √ (1)