688 NATURE April 30, 1949 Vol
Total Page:16
File Type:pdf, Size:1020Kb
688 NATURE April 30, 1949 Vol. 163 Measurement of Diversity The third and fourth cumulants of the distribution THE 'characteristic' defined by Yule1 and the of l have also been calculated exactly. They indicats 'index of diversity' defined by Fisher• are two that as N increases, the distribution tends to normality measures of the degree of concentration or diversity except when A = 1/Z ; in that case the distribution of achieved when the individuals of a population are lNZ tends to that ofx• with Z - 1 degrees offreedom, classified into groups. Both are defined as statistics but with its mean moved from Z- 1 toN. to be calculated from sample data and not in terms The characteristic defined by Yulel is, in the 2 of population constants. The index of diversity has notation used above, 1,000 :En(n - 1)/1\' , which so far been used chiefly with the logarithmic distribu differs from l, the sample estimator of A, only in tion. It cannot be used everywhere, as it does not having N instead of N - 1 in the denominator and always give values which are independent of sample in the scale factor of 1,000. size ; it cannot do so, for example, when applied Now let us see what value A takes for a population to an infinite population of individuals classified into containing Z groups the frequencies of which are a finite number of groups. Williams• has pointed 1ti = Wi/ :Ew, where the Wi are chosen at random out a relationship between the characteristic and and independently from the Type III distribution the index of diversity when both are applied to a 1 logarithmic distribution. The present purpose is to -w k-1 define and examine a measure of concentration in dF = (k _ I) ! e w dw, terms of population constants. Consider an infinite population such that each This may be called a 'negative binomial population', individual belongs to one of Z groups, and let 1t 1 ••• 1tz since samples drawn from it by the 'fixed exposure' ( :E1t = 1) be the proportions of individuals in the method will obey the negative binomial distribution. various groups. Then A defined as :E1t 2 is a measure The value of A appropriate to it is obtained by 2 2 of the concentration of the classification. It can averaging :Ewi /( :Ewi) over all sets (w" w 2 ••• wz) take any value between 1/Z and 1, the former which can be drawn from the population of values representing the smallest concentration or largest of w. Thus diversity possible with Z groups, and the latter com 00 00 plete concentration, all the 1 A = e-rw [w1 ••• Wz]k-1 - :Ewi • dw •• • dwz = _k__±_!_ individuals being in a single J... J[-- (k-1)!- -Jz (:Ewi )2 1 Zk + I' group. A can be simply 0 interpreted as the prob ability that two individuals chosen at random and The Poisson distribution is the special case of the independently from the population will be found negative binomial distribution in which k tends to to belong to the same group. infinity. Under this condition, A = 1/Z. This is as Now suppose a sample of N individuals to be we would expect, since the Poisson distribution arises chosen at random from a population of this kind, from a population in which all groups are equally and let n 1, n 2 ••• nz (:En = N) be the numbers of represented, and so the probability that two in individuals falling into the various groups. It is dividuals chosen at random will be found to belong to the same group must be 1/Z. :En(n - 1) The other extreme case of the negative binomial easily shown that l = N (N _ -I) is an unbiased is the logarithmic population, which is obtained by letting Z tend to infinity and k tend to zero simul estimator of A; this is almost obvious since !N(N -I) taneously so that the product Zk remains finite and is the number of pairs in the sample and t:En(n-1) tends to a quantity called a:. (This is not quite the is the number of pairs drawn from the same group. same derivation as that used by Fisher•, but the l is also an unbiased estimate of A when the sample quantity a: is the same as his index of diversity.) size varies, provided no samples of size 0 or I are The value obtained for A under this limiting process is included and that the probability of the sample 1/ (rx + 1). (n" n 2 •• • nz) splits into these two factors : It will be noticed that this last value is not con sistent with the equation given by Williams•, namely, N! that Yule's characteristic had the value 1,000/a: when P(n • • • nz) = 10 n 2 P(N) 1 1 (7t 1)n, (7t 2 )n, . n 1 • n, .... applied to the logarithmic distribution. His result was obtained by applying Yule's formula to a series where P(N) gives the probability distribution of the of expected values, whereas the present procedure sample size, 2 -<; N -<; ao • This is true in particular is equivalent to applying the formula first and then when samples are obtained by the 'fixed-exposure' averaging the r esult. Some support for the new method common in biological work, N having then equation is found by considering the ranges of the a Poisson distribution adjusted for the absence of variables concerned. Since the characteristic cannot the first two terms. exceed 1,000, the earlier equation would deny to a: If repeated samples of size N are drawn from the all values less than I ; but the present one allows it same population, the values of l obtained will be the range 0 -<; a: <; ao , while I A 0. distributed about A with variance E. H. SIMPSON 3 2 4N(N- I)(N- 2) :E1t + 2N(N- 1) I:1t - 2N(N- 1) (2N- 3) (:E7t•) 2 ; 3 West End Avenue, [N(N- 1) ] 2 Pinner. Jan. 29. or, if N be very large, approximately 1 Yule, "Statistical Study of Literary Vocabulary" (Cambridge, 4 1944). N [:E1t• - ( :E7t")"J. 1 Fisher, Corbet and Williams, J • .Animal Ecol., 12, 42 (1943). 1 Williams, Nature, 157, 482 (1946). © 1949 Nature Publishing Group.