<<

CHAPTER 9

Data Transformations

Most sets benefit by one or more data Domains and ranges transformations. The reasons for transforming data Bear in mind that some transformations are can be grouped into statistical and ecological reasons: unreasonable or even impossible for certain types of Statistical data. 9.1 lists the kinds of data that are potentially usable for each transformation. • improve assumptions of normality, linearity, homogeneity of variance, etc. Monotonie transformations • make units of attributes comparable when mea­ sured on different scales (for example, if you have elevation ranging from 100 to 2000 meters and Power transformation slope from 0 to 30 degrees)

Ecological Different parameters (exponents) for the transfor­ • make distance measures work better mation change the effect of the transformation; p = 0 • reduce the effect of total quantity (sample unit gives presence/absence, p - 0.5 gives square root, etc. totals) to put the focus on relative quantities The smaller the parameter, the more compression applied to high values (Fig. 9.1 ). • equalize (or otherwise alter) the relative impor­ tance of common and rare species The square root transformation is similar in effect to, but less drastic than, the log transform. Unlike the • emphasize informative species at the expense of log transform, special treatment of zeros is not needed. uninformative species. The square root transformation is commonly used. Monotonie transformationsare applied to each Less frequent is a higher root, such as a cube root or element of the data matrix, independent of the other fourth root (Fig. 9.1). For example. Smith et al. (2001) elements. They are "monotonie" because they change the values of the data points without changing their rank. Relativizationsadjust matrix elements by a row 10 or column standard (e.g.. maximum, sum, mean, etc.). 9 One transformation described below, Beals smoothing, 8 is unique in being a probabilistic transformation 7 based on both row and column relationships. In this power 1/2 chapter, we also describe other adjustments to the data 6 power 1 3 matrix, including deleting rare species, combining 5 entities, and calculating first differences for time 4 power = 14 series data. 3 power " 1 10 It is difficult to overemphasize the potential 2 importance of transformations. They can make the 1 difference between illusion and insight, fog and clarity . 0 To use transformations effectively requires a good 0 25 50 75 100 understanding of their effects, and a clear vision of your goals. x Notation.— In all of the transformations described below, Figure 9.1. Effect of square root and higher root x,j = the original value in row / and column j of the transformations, b = f(x). Note that roots higher than data matrix three are essentially presence-absence transformations, yielding values close to 1 for all nonzero values. b,} = the adjusted value that replaces x„.

67 ( 'hapter 9

Table 9 1 Domain of input and range of output from transformations

Reasonable and acceptable Range offix / domain of x MONOTONIC TRANSFORMATIONS

x" (power) all 0 or 1 only x (power) nonnegative nonnegative log(x) positive all (2/7t)-arcsin(x) 0 < x < 1 0 to 1 inclusive (2/7T)-arcsin (x ") 0 < x < 1 0 to 1 inclusive SMOOTHING Beals smoothing 0 or 1 only 0 to 1 inclusive ROW/COLUMN RELATIVIZATIONS general nonnegative 0 to 1 inclusive by maximum nonnegative 0 to 1 inclusive by mean all all by standard deviates all generally between -10 and 10 binary by mean all 0 or 1 only rank all positive integers binary by median all 0 or 1 only ubiquity nonnegative nonnegative information function of ubiquity nonnegative nonnegative

applied a cube root to count data, a choice supported bv various sciences. They claim that the abundance of an optimization procedure. Roots at a higher power species follows a truncated lognormal distribution, than three nearly transform to presence-absence: citing Sugihara (1980) and Magurran (1988) While nonzero values become close to one. while zeros the nonzero values of community data sets often remain at zero. resemble a lognormal distribution, excluding zeros often amounts to ignoring half of a data set. The log­ Logarithmic transformation normal distribution is fundamentally flawed when applied to community data because a zero value is. K = lo g (* „ ) more often than not. the most frequent abundance Log transformation compresses high values and value for a species. Nevertheless, the log transforma­ spreads low values by expressing the values as orders tion is extremely useful in community analysis, of magnitude. Log transformation is often useful when providing that one carefully handles the problem of there is a high degree of variation within variables or log(O) being undefined. when there is a high degree of variation among To log-transform data containing zeros, a small attributes within a sample. These are commonly true number must be added to all data points. If the lowest with count data and biomass data. nonzero value in the data is one (as in count data), then Log transformations are extremely useful for many it is best to add one before applying the transforma­ kinds of environmental and habitat variables, the log­ tions: normal distribution being one of the most common in hi} = log(x,; +1) nature See Limpert et al. (2001) for a general intro­ duction to lognormal distributions and applications in Data Transformation

If. however, the lowest nonzero value of x differs < x < 1 The function arcsin is the same as sin 1 or from one by more than an order of magnitude, then inverse sine Data must range between zero and one. adding one will distort the relationship between zeros inclusive If they do not. you should rclativize before and other values in the data set For example, biomass selecting this transformation. data often contain many small decimal fractions Unlike the arcsine-squareroot transformation, an (values such as 0.00345 and 0.00332) ranging up to arcsine transformation is usually counterproductive in fairly large values (in the hundreds). Adding a one to community ecology, because it tends to spread the high the whole data set will tend to compress the resulting values and compress the low values (Fig 9.2). This distribution at the low end of the scale. The order-of- might be useful for distributions with negative skew, magnitude difference between 0.003 and 0.03 is lost if but community data almost alway s have positiv e skew you add a one to both values before log transformation: log( 1.003) is about the same as log( 1.03). Arcsine sqnareroot transformation The following transformation is a generalized procedure that (a) tends to preserve the original order bj = 2/π * arcsin (д/х^) of magnitudes in the data and (b) results in values of zero when the initial value was zero. Given: The arcsine-squareroot transformation spreads (he ends of the scale for proportion data, while com­ Min(.v) is the smallest nonzero value in the data pressing the middle (Fig. 9.2). This transformation is lnt(x) is a function that truncates x to an integer by recommended by many statisticians for proportion dropping digits after the decimal point data, often improving normality (Sokal and Rohlf c = order of magnitude constant = Int(log(Min(.v)) 1995). The data must range between zero and one. J — decimal constant = log'1 (c) inclusive. The arcsine-squareroot is multiplied by 2/π to rescale the result so that it ranges from 0 to 1 then the transformation is The logit transformation, b = ln(x/(l-x)). is also b,j = log(x,; + J ) - c sometimes used for proportion data (Sokal and Rohlf Subtracting the constant c from each element of the 1995). However, if x = 0 or x = 1. then the logit is data set after the log transforma­ undefined Often a small constant is added to prev ent tion shifts the values such that the lowest value in the data set will be a zero For example, if the smallest nonzero value in the data set is

0.00345. then 0.8 - log(min(x)) = -2.46 c = mt(log(min(x))) = -2 arcsin(sqrt(x) log'1 (c) = 0.01. Applying the transformation to some example values: If x = 0. arcsin(x then/) = log(0+0.01)-(-2). therefore b = 0. If x = 0.00345. then b = log(0.00345+0.01 )-(-2), therefore b = 0.128.

A resine transformation h,, = 2/π * arcsin(x,,) The constant 2/π scales the result of arcsin(x) [in radians] to range from 0 to 1. assuming that 0 Figure 9.2. Effect of several transformations on proportion data ln(0) and division by /ero Alternatively, empirical data are quantitative and you do not want to lose this logits may be used (see Sokal and Rohlf 1995.762). information Because zeros are so common in community data, it Beals smoothing can be slow' to compute If you seems reasonable to use the arcsine squareroot or have a large data set and a slow computer, be sure to squareroot transformations to avoid this problem. allocate plenty of time This transformation is avail­ able in PC-ORD but apparently not in other packages Beals smoothing for statistical analysis. Beals smoothing replaces each cell in the commu­ nity matrix with a probability of the target species Relativizations occurring in that particular sample unit, based on the loint occurrences of the target species with the species "To relativize or not to relativize. that focuses the that are actually in the sample unit. The purpose of question. (Shakespeare. " ????) this transformation (also known as the sociological favorability index, Beals 1984) is to relieve the "zero- truncation problem" (Beals 1984). This problem is Relativizations rescale individual rows (or nearly universal in community data sets and most columns) in relationship to some criterion based on the severe in heterogeneous community data sets that other rows (or columns) Any relativization can be contain a large number of zeros (i.e., most samples applied to either rows or columns. contain a fairly small proportion of the species). Beals Relativization is an extremely important tool that smoothing replaces presence/absence or other binary all users of multivariate statistics in community eco­ data with quantitative values that represent the logy MUST understand There is no right or wrong "favorability" of each sample for each species, regard­ answer to the question of whether or not to relativize less of whether the species was present in the sample. UNTIL one specifies the question and examines the The index evaluates the favorability of a given sample properties of the data. for species based on the whole data set, using the If the row totals are approximately equal, then proportions of joint occurrences between the species relativization by rows will have little effect. Consis­ that do occur in the sample and species i. tency of row totals can be evaluated by the coefficient of variation (CV) of the row totals (Table 9.2). The CV% is calculated as l(K)*(standard deviation / mean). In this case, it is the standard deviation of the row totals divided by the mean of the row totals. where S, is the number of species in sample unit i, A/,* is the number of sample units with both species j and k. and Λ), is the number of sample units with species k. This transformation is illustrated in Box 9.1 Table 9.2. Ev aluation of degree of v ariability in row or column totals as measured with the coefficient of This transformation is essentially a smoothing variation of row or column totals operation designed for community data (McCune 1994). As with any numerical smoothing, it tends to CV, % Variability among rows (or columns) reduce the noise in the data by enhancing the strongest < 50 Small. Relativization usually has small patterns. In this case the signal that is smoothed is the effect on qualitative outcome of the pattern of joint occurrences in the data. This is an analysis extremely powerful transformation that is particularly 50-100 Moderate (with a correspondingly effective on heterogeneous or noisy data. Caution is moderate effect on the outcome of warranted, however, because, as for any smoothing further analy sis) function, this transformation can produce the appear­ ance of reliable, consistent trends even from a series of 100-300 Large. Large effect on results. random numbers. > 300 Very large. This transformation should not be used on data sets with few zeros. It also should not be used if the Data Transformation

Box 9.1. Example of Beals smoothing

Data matrix X before transformation (3 sample units χ 5 species): spi sp2 sp3 sp4 sp5 S, SUI 1 1 1 1 SU2 0 0 1 0 SU3 1 0 0 0 •V, 2 1 2 1 S, = number of species in sample unit /. N, = number of sample units with species j.

Construct matrix M. where M,k = number of sample units with both species j and k. (Note that where j = k. then Mjk = Nį). Species k 1 2 3 4 5 1 2 Species j 2 1 1 3 1 0 1 4 1 0 1 2 5 1 0 1 1 1

Construct new matrix B containing values transformed with Beals smoothing function:

h,, - I Σ I for all k w ith x lk τ 0 S, k \ N k J

Data after transformation (B): spl sp2 sp3 sp4 sp5 SUI 0.88 0 13 0.75 0.88 0.75 SU2 0.50 0.00 0.50 1.00 0 50 SU3 1.00 0.75 0.25 0.25 0.25

Example for sample unit 1 and species 2: bu = 1/4 (1/2 + 0/1 +0/2 + 0/1) bl2 = 0.25 (0.5) b¡ : = 0.125 (rounded to 0.13 in matrix above) Example for sample unit 3 and species 2: b3,2 = 1/2 (1/2+ 1/1) ¿-3.’ = 0 5 (15) bl2 = 0.75 on a continuous, quantitative scale. 'Rank" is the order of species ranked by their abundance. their by ranked species of order the is 'Rank" scale. quantitative continuous, a on Figure 9.3. Effect of various transformations on relative weighting of species. Species abundance was measured measured was abundance Species species. of weighting relative on transformations various of Effect 9.3. Figure Sump(occur) Frequency Sum of Cover 120 100 45 40 50 25 30 20 55 15 10 60 45 20 40 80 20 25 35 40 50 30 10 15 0 5 0 0 5 % ♦ * 10 10 10 raw data raw

presence- Rank smoothing absence ♦ ♦ ♦ ♦ Beals 0 30 20 20 20 20 4 30 30 CJ СЛ \ U CO k- S 0-5 'S 6 · o ¿ o o > 60 «> 3 c 3 e o o > o 0.4 5 20 Ö

20 40 45 25 30 35 40 60 80 10 20 30 50 70 10 0 5 0 ♦ ♦ ♦ ♦ ♦ ♦ . N «v rei. by species by rei. 10 10 10 species max. species qt (x) sqrt hn e. by rei. then maximum

Rank log (x) log 20 20 20 20

30 30 30 Data Transformation

[f the row or column totals are unequal, one must General relativization decide whether to retain this information as part of the By rows: By columns: analysis or whether to remove it by relativizing. One must justify this decision on biological grounds, not on its effect on the CV of row or column totals. For example, consider two quadrats with identical propor­ tions of three species, but one quadrat has a total cover of 1% and the other has a total cover of 95%. If the data are relativized, then the quadrats appear similar or for a matrix of n rows and q columns. identical. If they are not relativized, then distance The parameter, p. can be set to achieve different measures will consider them to be very different. objectives. If p = 1, relativization is by row or column Which choice is correct? The answer depends on the totals. This is appropriate when using analytical tools question. Does the question refer to proportions of based on city-block distance measures, such as Bray- different species or is the total amount also important? Curtis or Sorensen distance If p = 2. you are "stand­ If the latter is true, the data should not be relativized. ardizing by the norm" (Greig-Snuth 1983. p. 248) An example demonstrates how relativization can Using p = 2 is the Euclidean equivalent of relativi­ change the focus of the analysis. Menges et al. (1993) zation by row or column totals. It is appropriate when reported rates of vegetation change based on both the analysis is based on a Euclidean distance . relativized and nonrelativized tree species data, begin­ The same effect can be achieved by using "relative ning with a matrix of basal area of each species in Euclidean distance" (see Chapter 6). remeasured permanent plots. They used absolute rates to emphasize structural changes (e.g.. increase in basal Relativization by maximum area of existing species) and relative rates to emphasize shifts in species composition (changes in the relative b,j = Xy/xmax, proportions of species). where rows (i) are samples and columns Ų) are species, Relativization is often used to put variables that xmax, is the largest value in the matrix for species j. were measured in different units on an equal footing. As for relativization by species totals, this adjustment For example, a data set may contain counts for some tends to equalize common and uncommon species species and cover for other species. In forest ecology , Relativization by species maxima equalizes the heights one may wish to combine basal area data for trees with of peaks along environmental gradients, while relativ i- cover data for herbs. If the species measured in zation by species totals equalizes the areas under the different units are to be analyzed together, then one curves of species responses. must relativize the data such that the quantify for each Many people have found this to be an effective species is expressed as a proportion of some total or transformation for community data. A couple of cau­ maximum abundance tions should be heeded, however. ( 1 ) very rare species Relativizations can have a huge effect on the can cause considerable noise in subsequent analyses if relative weighting of rare and abundant species. Raw not omitted; (2) this and any other statistic based on quantitative data on a continuous scale tends to have a extreme values can accentuate sampling error lew abundant species and many rare species (Fig. 9,3) A multivariate analysis of these raw data might Adjustment to mean emphasize only a few species, ignoring most of the species. Log or square-root transformation of the data usually moderates the imbalance, while relativization The row or column mean is subtracted from each by species totals can eliminate it completely (Fig. 9.3). value, producing positive and negative numbers If fins is. however, a drastic transformation. Rare relativized by rows, the means are row means: if by species often occur haphazardly, so that giving them a columns, the means are column means. The negative lot of weight greatly increases the noise in the analysis numbers obviate proportion-based distance measures, such as Sorensen and Jaccard. This unstandardized

73 ( 'h apte r 9

centering procedure can have detrimental effects on ing on the number of zeros in each row or column For analysis of community data, it lends to emphasize to example, the values 0, 0, 0. 0. 6, 9 would receive the values of /.ero more than does the raw data Also, ranks 2.5, 2.5. 2.5, 2.5. 5, 6. while the values 0. 0. 6. 9 more variable species are reduced in importance would receive the ranks 1.5, 1.5, 3. 4 relative to more constant species Binary with respect to median: Adjustment to standard deviate bj = 1 if x,, > m edian, b„ = 0 if x,, < m edian

K =(xv -Xj)iSj The transformed values are zeros or ones An element is assigned a zero if its value is less than or where .v, is the standard deviation within column j. equal to the row or column median The element is Each transformed value represents the number of assigned a one if its value is greater than the row or standard deviations that it differs from the mean, often column median. This transformation can be used to known as " 2 scores." As for all of the relativizations. emphasize the optimal pans of a species range, at the tins transformation can be applied to either rows or same time equalizing to some extent the weight given columns It is. however, usually applied to variables to rare and dominant species. The Rank adjustment (columns). This transformation results in all variables caution also applies to this relativization because it too having mean = 0 and variance = 1. is based on ranks Because this transformation produces both positive and negative numbers, it is NOT compatible with Weighting by ubiquity proportion-based distance measures, such as Soren­ sen's While this transformation is of limited utility bi} = U j Xy where U J = N j / N for species data, it can be a verv useful relativization If rows are samples, columns are species, and for environmental variables, placing them on equal relativization is by columns, more ubiquitous species footing for a variety of purposes. are given more weight. Under these conditions, .V, is the number of samples in which species j occurs and Y Binary with respect to mean is the total number of samples h,j = 1 if x,, > x , h,j = 0 if Xy < x Information function of ubiquity An element is assigned a zero if its value is less than or equal to the row or column mean, x. The \ = 11 X: element is assigned a one if its value is above the mean Applied to species (columns), this transforma­ where tion can be used to contrast above-average conditions Ij =~Pj^g(pJ)-(i-pJ)\og(]-pJ) with below-average conditions The transformation therefore emphasizes the optimal parts of a species and pj = ,V,/.V with V, and V as defined above. distribution. It also tends to equalize the influence of To illustrate the effect of this relativization, common and rare species Applied to sample units, it assume that rows are samples, columns are species, emphasizes dominant species and is likely to eliminate and relativization is by columns. Maximum weight is many species, particularly those that rarely, if ever, applied to species occurring in half of the samples occur in high abundances. because those species have the maximum information content, according to information theory. Very com­ Rank adjustment mon and rare species receive little weight. Note that if there are empty columns, the transformation will fail Matrix elements are assigned ranks within rows or because the log of zero is undefined columns such that the row or column totals are con­ stant. Ties are assigned the average rank of the tied elements For example, the values 1, 3, 3. 9. 10 would Double relativizations receive ranks 1. 2.5. 2.5. 4, 5. The relativizations described above can be applied This transformation should be applied with in various combinations to rows then columns or vice- caution For example, most community data have versa. When applied in series, the last relativization many zeros These zeros are counted as ties. Because necessarily mutes the effect of the preceding relativi­ the number of zeros in each row or column will vary, zation. zeros will be transformed to different values, depend­ Data Transformation

The most common double relativization was first Deleting rare species is clearly inappropriate if you used by Bray and Curtis (1957). They first relativized wish to examine patterns in species diversity. Cao et by species maximum, equalizing the rare and abundant al. (1999) correctly pointed this out but confused the species, then they relativized by SU total. This and issue by citing proponents of deletion of rare species other double relativizations tend to equalize emphasis who were concerned with extracting patterns with among SUs and among species. This conies at a cost multivariate analysis, not with comparison of species of diminishing the intuitive meaning for individual diversity. None of the authors they criticized suggested data values. deleting rare species prior to analysis of species Austin and Greig-Sntith (1968) proposed a richness. "contingency deviate" relativization. This measures For multivariate analysis of correlation structure the deviation from an expected abundance. The (in the broad sense), it is often helpful to delete rare expected abundance is based on the assumption of species. As an approximate rule of thumb, consider independence of the species and the samples. Expected deleting species that occur in fewer than 5% of the abundance is calculated from the marginal totals of the sample units. Depending on vour purpose, however, n x p data set. just as if it were a large contingency vou may wish to retain all species or eliminate an even table: higher percentage Some analysts object to removal of rare species on P Ί the grounds that we are discarding good information. Σ χ Ί Σ x 'j Empirically this can be shown true or false by using an L 7 ' ! '=1 K * * i - ~ r ~ , — external criterion of what is "good' information. You can try this yourself. Use a familiar data set that has at ΣΣ*. least a moderately strong relationship between J - - 1 i - l communities and a measured env ironmental factor The resulting values include both negative and Ordinate (Part 4) the full data set. rotate the solution to positiv e v alues and are centered on zero. The row and align it with that environmental variable (Ch 15). and column totals become zero Because this transforma­ record the correlation coefficient between the environ­ tion produces negative numbers, it is incompatible w'ith mental variable and the axis scores. Now delete all proportion-based distance measures. species occurring in just one sample unit. Repeat the ordination-^rotation->correlation procedure Progres­ One curious feature of this transformation is that sively delete more species (those only in two sample zeros take on various values, depending on the margi­ units, etc.). until only the few most common species nal totals. The meaning of a zero is taken differently remain. Now plot the correlation coefficients against depending on whether the other elements of that row the number of species retained (Fig 9 4). and column create large or small marginal totals. With sample unit x species data, a zero for an other­ In our experience the correlation coefficient wise common species will be given more weight (i.e.. a usually peaks at some intermediate level of retention of more negative value). This may be ecologically species (Fig. 9.4) When including all species, the meaningful, but applied to rows the logic seems noise from the rare ones weakens the structure slightly counter-intuitive: a species that is absent from an On the other hand, when including only a few otherwise densely packed sample unit will also be dominant species, too little redundancy remains in the given high weight. data for the environmental gradient to be clearly expressed. Deleting rare species A second example compared the effect of stand structures on small mammals using a blocked design Deleting rare species is a useful way of reducing (D. Waldien 2002, unpublished) Fourteen species the bulk and noise in your data set without losing much were enumerated in 24 stands, based on trapping data, information. In fact, it often enhances the detection of then relativized bv species maxima. The treatment relationships between community composition and effect size was measured with blocked MRPP (Ch 24). environmental factors. In PC-ORD, you select deletion using the A statistic (chance-corrected within-group ot columns "with fewer than N nonzero numbers." For agreement). Rare species were successively deleted, example, if ,V = 3. then all species with less than 3 beginning with the rarest one. until only half of the occurrences are deleted. If .V = 1. all empty species species remained. In this case, removal of the four (columns) are deleted. rarest species increased slightly the apparent effect size Chapter У

“· — Depth to waier table 0.8 Ό ■ Distance g ° '6 from stream ψ 0.4 U, Elevation above stream 0.2

0.0 0 5 10 15 20 25 30 35 40 45

Criterion for Species Removal (occurrence in % o f SUs)

Figure 9.4. Correlation between ordination axis scores and environmental variables can often be improved by removal of rare species. In this case, the strength of relationship between hydrologie variables and vegetation, as measured by r \ is maximized with removal of species occurring in fewer than 5-15% of the sample units, depending on the hydrologie variable. The original data set contained 88 species. 59. 35. 16. and 9 species remained after removal of species occurring in fewer than 5, 15, 40, and 45% of the sample units, respectively. Data are courtesy of Nick Otting (1996, unpublished).

(Fig. 9.5). The fifth and sixth rarest species, however, were distinctly patterned with respect to the treatment, 0.2« so their removal sharply diminished the apparent effect si/e. Another objection to removal of rare species is that you cannot test hypotheses about whole-community A structure, if you exclude rare species. Certainly this is true for hypotheses about diversity. But it also applies 0.05 to other measures of community structure. Statistical hypothesis tests are always, in some form or another, 0.00 based on evaluating the relative strength of signal and 0 1 2 3 4 3 6 7 noise. Because removal of rare species tends to reduce noise, the signal is more likely to be detected. This can Number of species removed be taken as an argument against removal of rare species because it introduces a bias toward rejecting a Figure 9.5. Response of A statistic (blocked MRPP) null hypothesis. Alternatively, one can define before­ to removal of rare species from small mammal hand the community of interest as excluding the rare trapping data. A measures the effect size of the species and proceed without bias treatments, in this case different stand structures. Mark Fulton (1998, unpublished) summarized the noise vs. signal problem well: punchline of a fairly good joke, with a little atten­ Noise and information can only be defined in tion you can hear what the two men in business the context of a question of interest. An analogy: suits two tables over are arguing about: and that we are sitting in a noisy restaurant trvmg to have a rumble you just heard is a truck full of furniture conversation. From the point of view of our turning the comer outside the restaurant. But none attempting to communicate, the ambient sound of this information is relevant to the conversation, around us is noise.' Yet that noise carries all and so we filter it out without thinking about the kinds of information — that clatter over to the left process much. is the bus person clearing dishes at the next table: the laughter across the room is in response to the Vegetation analysis is a process of noise filtering right trom the very start Data collection Data Transformation

itself is a tremendous filtering process. We decide First difference of time series what NOT to measure Any transformations we do on the data — whether weighting, rescaling, or If your data form a time series (sample units are deletion of rare species — is also a filtering repeatedly evaluated at fixed locations), you may want process. Ordination itself is a further filter. The to ordinate the differences in abundance between patterns in the whole н-dimensional mess are of successive dates rather than the raw abundances: less interest than a carefully selected reduction of b n a u j . į - a ¡i.' those patterns. The point is. as scientists, we need to do this process of information selection and for a community sampled at times t and /+1 This is noise reduction carefully and with full knowledge simply the extension through time of the idea described of what we are doing. There is no single procedure in the preceding section. This transformation can be which will always bnng out the information of called a "first difference " (Allen et al. 1977) because it interest. Data selection, transformation, and analy­ is analogous to the first derivative of a time series sis can only be judged on how well they work in curve. With community data, a matrix of first differ­ relation to the questions at hand. ences represents changes in species composition. If we visualize changes in species composition as vectors in Combining entities species space, the matrix of differences represents the Aggregate sample units (SUs) can be created by lengths and directions of those vectors A matrix of averaging existing SUs. Each new entity is the "second" differences would represent the rates of "centroid" of the entities that you average. See Greig- acceleration (or deceleration) of sample units moving Smith (1983. p. 286) for comments on ordinating through species space. groups of SUs In general, community SUs should not The matrix of first differences takes into account be averaged unless they are very similar If SUs are the direction of compositional change. For example, heterogeneous, then the average species composition assume that the plankton in a lake go through two tends to fall outside the variation of the SUs. the particular compositional states in the fall, then go averages being unnaturally species-rich. through the same compositional states in the spring, If your rows are SUs. and vou also have an but in the opposite direction. The difference between environmental matrix, you should also calculate the two fall samples is not. therefore, the same as the centroids for environmental data. Be careful if you difference between the spring samples, even though the have categorical environmental variables. Depending absolute values of the differences are equal. Analyzing on how the categories are structured, averaging the the signed difference is logical, but other possibilities categories can be meaningless. exist. Allen et al. (1977) analyzed the absolute differences, creating a matrix of species' contributions Difference between two dates to community change, without regard to the direction of the change: Before-and-after data on species abundance b„ = ! η,,.,.ι - a ,, !, I obtained by revisiting the same SUs can be analyzed as differences, rather than the original quantities. If a„, If environmental variables are recorded at each date, you might analyze species change from time t to and a ,,2 are the abundances of species j in sample unit i at times 1 and 2. then the difference between dates is ir 1 in relationship to the state of the environment at time t. Alternatively, you could apply the first b„ ~ a,jz * a,ß difference transformation to the environmental The transformed data represents changes through variables as well, to analyze the question of how time Even with species abundance data, this transfor­ community change is related to environmental change. mation yields variables that are more or less normally On the other hand, variables that are constant through distributed with means near zero and with both posi­ time for a given sample unit through time (e.g., tive and negative numbers. After this transformation, location or treatment variables) could be retained be sure not to use methods that demand nonnegative without transformation. numbers: proportion coefficients (such as Sorensen) as Note that the statistical properties of these distance measures and techniques based on Correspon­ differences are radically different from the original dence Analysis (CA. RA. CCA, DCA. Twinspan). On data. For more information, see the preceding section the other hand. PCA and other techniques calling for on differences between two dates. multivariate normal data and linear relationships among variables will work far better on such a matrix than they would with either matrix alone. C 'hapter 9

A general procedure for data procedure for data adjustments that will be applicable to many community data sets (Table 9.3). For more adjustments details on steps 2, 3, and 4. consult the preceding pages. For more detail on step 5, consult the section Species data on outliers in Chapter 7 While one can easily grasp the logic of a particular The sequence of actions is important For data adjustment, the number of combinations and example, we check for outliers last, because many sequences can be bewildering. Although it is impos­ apparent outliers will disappear, depending on the sible to write a step-by-step cookbook that covers all monotonic transformations or relativizations that are possible data sets and goals, we suggest a general used.

Table 9.3. Suggested procedure for data adjustments of species data matrices.

Action to be considered Criteria

1. Calculate descriptive statistics. Repeat this after Always each step below. (In PC-ORD run Row & column summary) Beta diversity (community data sets) Average skewness of columns Coefficient of variation (CV, %) CV of row totals CV of column totals 2. Delete rare species (< 5% of sample units) Usually applied to community' data sets, unless contrary to study goals

3 Monotonie transformation (if applied to species, A. Average skew ness of columns (species) then usually applied uniformly to all of them, so that B. Data range over how many orders of magnitude .’ all are scaled the same) (Count and biomass data often are extreme.) C Beta diversity. (Consider presence/absence transformation for community data when ß is high.)

4 Row or column relativizations What is the question .’ Are units for all variables the same.’ Is relativization built into the subsequent analysis? CV of row totals CV of column totals What distance measure do you intend to use? Note: regardless of your decision to relativize or not. you should state your decision and justify it briefly on biological grounds

5. Check for outliers based on the average distance of standard degree of each point from all other points. Calculate standard deviation problem deviation of these average distances. Describe outliers and take steps to reduce influence, if <2 no problem necessary 2-2.3 weak outlier 2.3-3 moderate outlier >3 strong outlier

78 ƒlala Transformation

Environmental data Adjustments of environmental data depend greatly on their intended use. as indicated in Table 9,4. Categorical and binary variables in general need 110 adjustment, but one should always examine quantitative environmental variables.

Table 9.4. Suggested procedure for data adjustments of quantitative variables in environmental data matrices.

Action to be considered Criteria

1. Calculate descriptive statistics for Al wavs quantitative variables. Repeat this after each step below. (In PC-ORD run Row & column summary)

Skewness and range for each variable (column)

2. Monotonic transformation (applied Consider log or square root transformation for variables with to individual variables, depending on skewness > 1 or ranging over several orders of magnitude. need) Consider arcsine squareroot transformation for proportion data.

3 Column relativizations Consider column relativization (by norm or standard deviates) if environmental variables are to be used in a distance-based analysis that does not automatically relativize the variables (for example, using MRPP to answer the question: do groups of sample units defined by species differ in environmental space0). Column relativization is not necessary for analyses that use the variables one at a time (e.g., ordination overlays) or for analyses with built-in standardization (e.g.. PCA of a correlation matrix).

4. Check for univariate outliers and Examine scatterplots or frequency distributions or relativize by take corrective steps if necessary. standard deviates ("z scores ') and check for high absolute values.