Generalized Linear Models and Point Count Data: Statistical Considerations for the Design and Analysis of Monitoring Studies
Total Page:16
File Type:pdf, Size:1020Kb
Generalized Linear Models and Point Count Data: Statistical Considerations for the Design and Analysis of Monitoring Studies Nathaniel E. Seavy,1,2,3 Suhel Quader,1,4 John D. Alexander,2 and C. John Ralph5 ________________________________________ Abstract The success of avian monitoring programs to effec- Key words: Generalized linear models, juniper remov- tively guide management decisions requires that stud- al, monitoring, overdispersion, point count, Poisson. ies be efficiently designed and data be properly ana- lyzed. A complicating factor is that point count surveys often generate data with non-normal distributional pro- perties. In this paper we review methods of dealing with deviations from normal assumptions, and we Introduction focus on the application of generalized linear models (GLMs). We also discuss problems associated with Measuring changes in bird abundance over time and in overdispersion (more variation than expected). In order response to habitat management is widely recognized to evaluate the statistical power of these models to as an important aspect of ecological monitoring detect differences in bird abundance, it is necessary for (Greenwood et al. 1993). The use of long-term avian biologists to identify the effect size they believe is monitoring programs (e.g., the Breeding Bird Survey) biologically significant in their system. We illustrate to identify population trends is a powerful tool for bird one solution to this challenge by discussing the design conservation (Sauer and Droege 1990, Sauer and Link of a monitoring program intended to detect changes in 2002). Short-term studies comparing bird abundance in bird abundance as a result of Western juniper (Juniper- treated and untreated areas are also important because us occidentalis) reduction projects in central Oregon. they can identify changes in bird responses to specific We estimate biologically significant effect sizes by management actions (Nichols 1999). However, the using GLMs to describe variation in bird abundance ability of avian monitoring programs to effectively relative to natural variation in juniper cover. These guide management decisions requires that studies be analyses suggest that for species typically positively designed to detect differences that are biologically associated with juniper cover, a 60-80 percent decrease meaningful. in abundance may be expected as a result of juniper reduction projects. With these estimates of expected The statistical ability to detect differences in abundance effect size and preliminary data on bird abundance, we between two treatments (e.g., habitats or management use computer simulations to investigate the power of practices) is described by statistical power. To evaluate GLMs. Our simulations demonstrate that when data are power one must answer two questions. (1) What statis- not overdispersed and sample sizes are relatively large, tical models are appropriate for the distributional pro- the statistical power of GLMs is approximated well by perties of the data? Probably the most widely used formulas that are currently available in the bird litera- method for monitoring bird abundance is point count ture for other statistical techniques. When data are surveys (Ralph et al. 1995). However, the properties of overdispersed, as may be the case with most point data produced from point counts may violate assump- count data, power is reduced. tions of commonly used statistical techniques. Count data often show non-normal distributions, especially when abundances are low and there are many zeros in the data. One approach to analyzing these data is to use ________________________________ generalized linear models. With such models, one may 1 specify non-normal distributions and results can be Department of Zoology, University of Florida, Gainesville, FL interpreted in a manner similar to the familiar analysis 32611-8525 2Klamath Bird Observatory, Box 758, Ashland, OR 97520 of variance framework. (2) What is a biologically 3Corresponding author: e-mail: [email protected] meaningful effect size? That is, what difference in 4Present address: Department of Zoology, University of Cam- abundance can be considered biologically, not just sta- bridge, Downing Street, Cambridge CB2 3EJ, United Kingdom tistically, significant? Once these two questions have 5 USDA Forest Service, Redwood Sciences Laboratory, Pacific been answered, power analyses can be carried out to Southwest Research Station, 1700, Bayview Drive, Arcata, CA identify the appropriate sampling effort needed for a 95521 rigorous study design (Nur et al. 1999, Foster 2001). USDA Forest Service Gen. Tech. Rep. PSW-GTR-191. 2005 744 Generalized Linear Models and Point Counts - Seavy et al. In this paper we discuss some properties of count data An alternative to nonparametric statistics or data trans- with an emphasis on how they can be analyzed with formations is the use of statistical techniques in which generalized linear models. We then use these models to non-normal distributions can be specified. Generalized analyze point count data from central Oregon and linear models (GLMs) are a flexible and widely used evaluate variation in bird abundance as a function of class of such models that can accommodate continu- natural variation in Western juniper (Juniperus occi- ous, count, proportion, and presence/absence response dentalis) cover. We use the results to define a biologi- variables (McCullagh and Nelder 1989, Agresti 1996, cally significant effect of juniper removal and conduct Crawley 1997, Dobson 2002). Many statistical pack- computer simulations to investigate the power of ages (e.g., SAS, SPSS) implement these models for generalized linear models to detect changes in bird logistic, Poisson, and negative binomial regression. abundance resulting from reduction of juniper cover. Although the application of GLMs to point count data Our objective is to illustrate how the design of moni- is not new (Link and Sauer 1998, Brand and George toring projects based on point count data can be 2001, Robinson et al. 2001), we review these models improved by the definition of a priori effect sizes, the here to provide the context for our estimates of effect application of generalized linear models, and attention size and power. to the distributional properties of the data. Generalized Linear Models Distributional Properties of Count Data The models with which biologists are most familiar are The fixed-radius point count method of measuring bird linear models of the form abundance generates count data (i.e., the number of birds at a station). The distribution of such data is y = b0 + b1x1 ... + bjxj + e bounded by zero because one cannot detect a negative number of birds. If stations are visited once, the num- where the first part of the right-hand side of the ber of individuals detected per station will always be equation (b0 + b1x1 ... + bjxj) specifies the expected either zero or a positive integer. When bird abundances value of y given x1...xj. This is called the 'mean' or 'systematic' part of the equation and contains a linear are low, the frequency distribution of detections is likely to be highly non-normal (right-skewed, with a combination of x variables. The random component of majority of observations at or near zero). If this is the the model (denoted by e), describes the deviation of the observed y values from the expected, and is assumed to case, then the use of statistical tests that assume normal distributions (e.g., t-tests, least-squares regressions) be drawn from a normal distribution with constant may be inappropriate. Perhaps the simplest approach to variance. Generalized linear models (GLMs) extend simple linear models by allowing transformations to comparing patterns of abundance measured by point counts is to use tests that are more flexible about the linearity in the mean part of the model, as well as non- distributional properties of the data. Such approaches normal random components (Agresti 1996). If mu is the expected value of y, then we can model not mu include non-parametric tests (Sokal and Rohlf 1995, Zar 1999) or resampling tests that use permutations of itself, but some function f(mu) of the expected value the data to make statistical inferences (Manly 1991, f(mu) = b + b x ... + b x Crowley 1992). 0 1 1 j j The function f(mu) is called the link function. For the Another approach is to transform the data such that simple linear model above, the link function is f(mu) = they meet assumptions of statistical methods based on mu. An alternative for loglinear models is to use a log the normal distribution. Such transformations must link function, f(mu) = log(mu). Our model now fulfill two objectives: they must standardize the shape becomes of the distribution to the normal curve and decouple any relationship between the mean and variance. Often, log(mu) = b0 + b1x1 ... + bjxj square-root or log transformations are suggested for Poisson distributed count data, in which the variance What about the random component of such a model? increases with the mean (Sokal and Rohlf 1995, For count data, the random component is often likely to Kleinbaum et al. 1998, Zar 1999). When counts contain follow a Poisson distribution (Zar 1999), a distribution zeros, a constant must be added prior to log transform- of integers between zero and infinity. The best-fit ation, and is often suggested for square root transform- model is found by maximum likelihood techniques ation. Values of 0.5 or 3/8 for this constant are sug- rather than minimizing the sums-of-squares. For a gested (Sokal and Rohlf 1995, Zar 1999), but should be GLM with Poisson error, the usual link function is used with the recognition that the choice of constants may have implications for statistical estimation (Thomas and Martin 1996). USDA Forest Service Gen. Tech. Rep. PSW-GTR-191. 2005 745 Generalized Linear Models and Point Counts - Seavy et al. log(mu) as above. Note that when the parameter esti- evidence of a true difference, it does not imply that the mates b0 ... bj are used to calculate a predicted value for treatments are the same (Johnson 1999).