Estimating Species Richness

OUP CORRECTED PROOF – FINAL, 18/10/2010, SPi CHAPTER 4 Estimating species richness Nicholas J. Gotelli and Robert K. Colwell 4.1 Introduction studies continue to ignore some of the fundamental sampling and measurement problems that can com- Measuring species richness is an essential objec- promise the accurate estimation of species richness tive for many community ecologists and conserva- (Gotelli & Colwell 2001). tion biologists. The number of species in a local In this chapter we review the basic statisti- assemblage is an intuitive and natural index of cal issues involved with species richness estima- community structure, and patterns of species rich- tion. Although a complete review of the subject is ness have been measured at both small (e.g. Blake beyond the scope of this chapter, we highlight sam- & Loiselle 2000) and large (e.g. Rahbek & Graves pling models for species richness that account for 2001) spatial scales. Many classic models in commu- undersampling bias by adjusting or controlling for nity ecology, such as the MacArthur–Wilson equi- differences in the number of individuals and the librium model (MacArthur & Wilson 1967) and number of samples collected (rarefaction) as well as the intermediate disturbance hypothesis (Connell models that use abundance or incidence distribu- 1978), as well as more recent models of neutral tions to estimate the number of undetected species theory (Hubbell 2001), metacommunity structure (estimators of asymptotic richness). (Holyoak et al. 2005), and biogeography (Gotelli et al. 2009) generate quantitative predictions of the number of coexisting species. To make progress in 4.2 State of the field modelling species richness, these predictions need 4.2.1 Sampling models for biodiversity data to be compared with empirical data. In applied ecology and conservation biology, the number of Although the methods of estimating species rich- species that remain in a community represents the ness that we discuss can be applied to assemblages ultimate ‘scorecard’ in the fight to preserve and of organisms that have been identified by genotype restore perturbed communities (e.g. Brook et al. (e.g. Hughes et al. 2000), to species, or to some 2003). higher taxonomic rank, such as genus or family (e.g. Yet, in spite of our familiarity with species rich- Bush & Bambach 2004), we will write ‘species’ to ness, it is a surprisingly difficult variable to mea- keep it simple. Because we are discussing estima- sure. Almost without exception, species richness tion of species richness, we assume that one or more can be neither accurately measured nor directly samples have been taken, by collection or observa- estimated by observation because the observed tion, from one or more assemblages for some speci- number of species is a downward-biased estimator fied group or groups of organisms. We distinguish for the complete (total) species richness of a local two kinds of data used in richness studies: (1) inci- assemblage. Hundreds of papers describe statistical dence data, in which each species detected in a sam- methods for correcting this bias in the estimation ple from an assemblage is simply noted as being of species richness (see also Chapter 3), and spe- present, and (2) abundance data, in which the abun- cial protocols and methods have been developed dance of each species is tallied within each sample. for estimating species richness for particular taxa Of course, abundance data can always be converted (e.g. Agosti et al. 2000). Nevertheless, many recent to incidence data, but not the reverse. 39 OUP CORRECTED PROOF – FINAL, 18/10/2010, SPi 40 BIOLOGICAL DIVERSITY Box 4.1 Observed and estimated richness Sobs is the total number of species observed in a sample, or ACE (for abundance data) in a set of samples. 10 S is the estimated number of species in the est Srare = fk is the number of rare species in a sample (each assemblage represented by the sample, or by the set of k =1 samples, where est is replaced by the name of an estimator. with 10 or fewer individuals). Abundance data. Let f be the number of species each Sobs k S = f is the number of abundant species in a represented by exactly k individuals in a single sample. abund k k =11 Thus, f0 is the number of undetected species (species sample (each with more than 10 individuals). present in the assemblage but not included in the sample), 10 nrare = kfk is the total number of individuals in the f1 is the number of singleton species, f2 is the number of k =1 doubleton species, etc. The total number of individuals in rare species. S f1 obs The sample coverage estimate is CAC E =1− ,the the sample is = . nrare n fk proportion of all individuals in rare species that are not k =1 singletons. Then the ACE estimator of species richness is Replicated incidence data. Let qk be the number of Srare f1 2 2 SACE = Sabund + + „ , where „ is the species present in exactly k samples in a set of replicate C AC E C AC E ACE ACE incidence samples. Thus, q is the number of undetected coefficient of variation, 0 ⎡ ⎤ species (species present in the assemblage but not included 10 − in the set of samples), q is the number of unique species, ⎢ k(k 1)fk ⎥ 1 ⎢ S ⎥ is the number of species, etc. The total number „2 ⎢ rare k=1 − , ⎥ q2 duplicate ACE =max 1 0 ⎣ CACE (nrare)(nrare − 1) ⎦ Sobs of samples is m = qk. k =1 The formula for ACE is undefined when all rare species = Chao 1 (for abundance data) are singletons (f1 nrare, yielding CACE = 0). In this case, compute the bias-corrected form of Chao1 instead. 2 f1 SChao1 = Sobs + is the classic form, but is not defined 2 f2 ICE (for incidence data) when f2 =0(no doubletons). f ( f −1) 1 1 10 SChao1 = Sobs + 2( f +1) is a bias-corrected form, always 2 S = q is the number of infrequent species in a obtainable. infr k 2 3 4 k =1 1 f1 f1 1 f1 var(SChao1)=f2 + + for sample (each found in 10 or fewer samples). 2 f2 f2 4 f2 Sobs f1 > 0 and f2 > 0 (see Colwell 2009, Appendix B of Sfreq = qk is the number of frequent species in a EstimateS User’s Guide for other cases and for asymmetrical k =11 confidence interval computation). sample (each found in more than 10 samples). 10 ninfr = kqk is the total number of incidences in the Chao 2 (for replicated incidence data) k =1 infrequent species. 2 q q1 − 1 SChao2 = Sobs + is the classic form, but is not defined The sample coverage estimate is CICE =1 ,the 2q2 ninfr when q2 =0(no duplicates). proportion of all incidences of infrequent species that are m−1 q1(q1−1) SChao2 = Sobs + is a bias-corrected form, not uniques. Then the ICE estimator of species richness is m 2(q2+1) always obtainable. Sinfr q1 „2 , „2 CICE = Sfreq + + ICE where ICE is the coefficient 2 3 4 CICE CICE 1 q1 q1 1 q1 var(SChao2)=q + + for of variation, 2 2 q2 q2 4 q2 ⎡ ⎤ > > 10 q1 0 and q2 0 (see Colwell 2009, Appendix B of ⎢ k(k − 1)qk ⎥ EstimateS User’s Guide for other cases and for asymmetrical ⎢S m ⎥ „2 =max⎢ infr infr k=1 − 1, 0⎥ ICE ⎣ − 2 ⎦ confidence interval computation). CICE (minfr 1) (ninfr ) OUP CORRECTED PROOF – FINAL, 18/10/2010, SPi ESTIMATING SPECIES RICHNESS 41 The formula for ICE is undefined when all infrequent Jackknife estimators (for incidence data) species are uniques (q1 = ninfr, yielding CICE = 0). In this case, compute the bias-corrected form of Chao2 The first-order jackknife richness estimator is instead. m − 1 S = S + q jackknife1 obs 1 m Jackknife estimators (for abundance data) The second-order jackknife richness estimator is The first-order jackknife richness estimator is 2 Sjackknife1 = Sobs + f1 − − q1 (2m 3) q2 (m 2) Sjackknife2 = Sobs + − The second-order jackknife richness estimator is m m (m − 1) Sjackknife2 = Sobs +2f1 − f2 By their nature, sampling data document only inferences about the number of colours (species) in the verified presence of species in samples. The the entire jar. This process of statistical inference absence of a particular species in a sample may depends critically on the biological assumption that represent either a true absence (the species is not the community is ‘closed,’ with an unchanging total present in the assemblage) or a false absence (the number of species and a steady species abundance species is present, but was not detected in the distribution. Jellybeans may be added or removed sample; see Chapter 3). Although the term ‘pres- from the jar, but the proportional representation of ence/absence data’ is often used as a synonym for colours is assumed to remain the same. In an open incidence data, the importance of distinguishing metacommunity, in which the assemblage changes true absences from false ones (not only for rich- size and composition through time, it may not be ness estimation, but in modelling contexts, e.g. Elith possible to draw valid inferences about community et al. 2006) leads us to emphasize that incidence structure from a snapshot sample at one point in data are actually ‘presence data’. Richness esti- time (Magurran 2007). Few, if any, real communities mation methods for abundance data assume that are completely ‘closed’, but many are sufficiently organisms can be sampled and identified as dis- circumscribed that that richness estimators may be tinct individuals. For clonal and colonial organisms, used, but with caution and caveats. such as many species of grasses and corals, indi- For all of the methods and metrics (Box 4.1) that viduals cannot always be separated or counted, but we discuss in this chapter, we make the closely methods designed for incidence data can nonethe- related statistical assumption that sampling is with less be used if species presence is recorded within replacement.

Load more