688 NATURE April 30, 1949 Vol

Total Page:16

File Type:pdf, Size:1020Kb

688 NATURE April 30, 1949 Vol 688 NATURE April 30, 1949 Vol. 163 Measurement of Diversity The third and fourth cumulants of the distribution THE 'characteristic' defined by Yule1 and the of l have also been calculated exactly. They indicats 'index of diversity' defined by Fisher• are two that as N increases, the distribution tends to normality measures of the degree of concentration or diversity except when A = 1/Z ; in that case the distribution of achieved when the individuals of a population are lNZ tends to that ofx• with Z - 1 degrees offreedom, classified into groups. Both are defined as statistics but with its mean moved from Z- 1 toN. to be calculated from sample data and not in terms The characteristic defined by Yulel is, in the 2 of population constants. The index of diversity has notation used above, 1,000 :En(n - 1)/1\' , which so far been used chiefly with the logarithmic distribu­ differs from l, the sample estimator of A, only in tion. It cannot be used everywhere, as it does not having N instead of N - 1 in the denominator and always give values which are independent of sample in the scale factor of 1,000. size ; it cannot do so, for example, when applied Now let us see what value A takes for a population to an infinite population of individuals classified into containing Z groups the frequencies of which are a finite number of groups. Williams• has pointed 1ti = Wi/ :Ew, where the Wi are chosen at random out a relationship between the characteristic and and independently from the Type III distribution the index of diversity when both are applied to a 1 logarithmic distribution. The present purpose is to -w k-1 define and examine a measure of concentration in dF = (k _ I) ! e w dw, terms of population constants. Consider an infinite population such that each This may be called a 'negative binomial population', individual belongs to one of Z groups, and let 1t 1 ••• 1tz since samples drawn from it by the 'fixed exposure' ( :E1t = 1) be the proportions of individuals in the method will obey the negative binomial distribution. various groups. Then A defined as :E1t 2 is a measure The value of A appropriate to it is obtained by 2 2 of the concentration of the classification. It can averaging :Ewi /( :Ewi) over all sets (w" w 2 ••• wz) take any value between 1/Z and 1, the former which can be drawn from the population of values representing the smallest concentration or largest of w. Thus diversity possible with Z groups, and the latter com­ 00 00 plete concentration, all the 1 A = e-rw [w1 ••• Wz]k-1 - :Ewi • dw •• • dwz = _k__±_!_ individuals being in a single J... J[-- (k-1)!- -Jz (:Ewi )2 1 Zk + I' group. A can be simply 0 interpreted as the prob­ ability that two individuals chosen at random and The Poisson distribution is the special case of the independently from the population will be found negative binomial distribution in which k tends to to belong to the same group. infinity. Under this condition, A = 1/Z. This is as Now suppose a sample of N individuals to be we would expect, since the Poisson distribution arises chosen at random from a population of this kind, from a population in which all groups are equally and let n 1, n 2 ••• nz (:En = N) be the numbers of represented, and so the probability that two in­ individuals falling into the various groups. It is dividuals chosen at random will be found to belong to the same group must be 1/Z. :En(n - 1) The other extreme case of the negative binomial easily shown that l = N (N _ -I) is an unbiased is the logarithmic population, which is obtained by letting Z tend to infinity and k tend to zero simul­ estimator of A; this is almost obvious since !N(N -I) taneously so that the product Zk remains finite and is the number of pairs in the sample and t:En(n-1) tends to a quantity called a:. (This is not quite the is the number of pairs drawn from the same group. same derivation as that used by Fisher•, but the l is also an unbiased estimate of A when the sample­ quantity a: is the same as his index of diversity.) size varies, provided no samples of size 0 or I are The value obtained for A under this limiting process is included and that the probability of the sample 1/ (rx + 1). (n" n 2 •• • nz) splits into these two factors : It will be noticed that this last value is not con­ sistent with the equation given by Williams•, namely, N! that Yule's characteristic had the value 1,000/a: when P(n • • • nz) = 10 n 2 P(N) 1 1 (7t 1)n, (7t 2 )n, . n 1 • n, .... applied to the logarithmic distribution. His result was obtained by applying Yule's formula to a series where P(N) gives the probability distribution of the of expected values, whereas the present procedure sample size, 2 -<; N -<; ao • This is true in particular is equivalent to applying the formula first and then when samples are obtained by the 'fixed-exposure' averaging the r esult. Some support for the new method common in biological work, N having then equation is found by considering the ranges of the a Poisson distribution adjusted for the absence of variables concerned. Since the characteristic cannot the first two terms. exceed 1,000, the earlier equation would deny to a: If repeated samples of size N are drawn from the all values less than I ; but the present one allows it same population, the values of l obtained will be the range 0 -<; a: <; ao , while I A 0. distributed about A with variance E. H. SIMPSON 3 2 4N(N- I)(N- 2) :E1t + 2N(N- 1) I:1t - 2N(N- 1) (2N- 3) (:E7t•) 2 ; 3 West End Avenue, [N(N- 1) ] 2 Pinner. Jan. 29. or, if N be very large, approximately 1 Yule, "Statistical Study of Literary Vocabulary" (Cambridge, 4 1944). N [:E1t• - ( :E7t")"J. 1 Fisher, Corbet and Williams, J • .Animal Ecol., 12, 42 (1943). 1 Williams, Nature, 157, 482 (1946). © 1949 Nature Publishing Group.
Recommended publications
  • Benford's Law As a Logarithmic Transformation
    Benford’s Law as a Logarithmic Transformation J. M. Pimbley Maxwell Consulting, LLC Abstract We review the history and description of Benford’s Law - the observation that many naturally occurring numerical data collections exhibit a logarithmic distribution of digits. By creating numerous hypothetical examples, we identify several cases that satisfy Benford’s Law and several that do not. Building on prior work, we create and demonstrate a “Benford Test” to determine from the functional form of a probability density function whether the resulting distribution of digits will find close agreement with Benford’s Law. We also discover that one may generalize the first-digit distribution algorithm to include non-logarithmic transformations. This generalization shows that Benford’s Law rests on the tendency of the transformed and shifted data to exhibit a uniform distribution. Introduction What the world calls “Benford’s Law” is a marvelous mélange of mathematics, data, philosophy, empiricism, theory, mysticism, and fraud. Even with all these qualifiers, one easily describes this law and then asks the simple questions that have challenged investigators for more than a century. Take a large collection of positive numbers which may be integers or real numbers or both. Focus only on the first (non-zero) digit of each number. Count how many numbers of the large collection have each of the nine possibilities (1-9) as the first digit. For typical J. M. Pimbley, “Benford’s Law as a Logarithmic Transformation,” Maxwell Consulting Archives, 2014. number collections – which we’ll generally call “datasets” – the first digit is not equally distributed among the values 1-9.
    [Show full text]
  • 1988: Logarithmic Series Distribution and Its Use In
    LOGARITHMIC SERIES DISTRIBUTION AND ITS USE IN ANALYZING DISCRETE DATA Jeffrey R, Wilson, Arizona State University Tempe, Arizona 85287 1. Introduction is assumed to be 7. for any unit and any trial. In a previous paper, Wilson and Koehler Furthermore, for ~ach unit the m trials are (1988) used the generalized-Dirichlet identical and independent so upon summing across multinomial model to account for extra trials X. = (XI~. X 2 ...... X~.)', the vector of variation. The model allows for a second order counts ~r the j th ~nit has ~3multinomial of pairwise correlation among units, a type of distribution with probability vector ~ = (~1' assumption found reasonable in some biological 72 ..... 71) ' and sample size m. Howe~er, data. In that paper, the two-way crossed responses given by the J units at a particular generalized Dirichlet Multinomial model was used trial may be correlated, producing a set of J to analyze repeated measure on the categorical correlated multinomial random vectors, X I, X 2, preferences of insurance customers. The number • o.~ X • of respondents was assumed to be fixed and Ta~is (1962) developed a model for this known. situation, which he called the generalized- In this paper a generalization of the model multinomial distribution in which a single is made allowing the number of respondents m, to parameter P, is used to reflect the common be random. Thus both the number of units m, and dependency between any two of the dependent the underlying probability vector are allowed to multinomial random vectors. The distribution of vary. The model presented here uses the m logarithmic series distribution to account for the category total Xi3.
    [Show full text]
  • Notes on Scale-Invariance and Base-Invariance for Benford's
    NOTES ON SCALE-INVARIANCE AND BASE-INVARIANCE FOR BENFORD’S LAW MICHAŁ RYSZARD WÓJCIK Abstract. It is known that if X is uniformly distributed modulo 1 and Y is an arbitrary random variable independent of X then Y + X is also uniformly distributed modulo 1. We prove a converse for any continuous random variable Y (or a reasonable approximation to a continuous random variable) so that if X and Y +X are equally distributed modulo 1 and Y is independent of X then X is uniformly distributed modulo 1 (or approximates the uniform distribution equally reasonably). This translates into a characterization of Benford’s law through a generalization of scale-invariance: from multiplication by a constant to multiplication by an independent random variable. We also show a base-invariance characterization: if a positive continuous random variable has the same significand distribution for two bases then it is Benford for both bases. The set of bases for which a random variable is Benford is characterized through characteristic functions. 1. Introduction Before the early 1970s, handheld electronic calculators were not yet in widespread use and scientists routinely used in their calculations books with tables containing the decimal logarithms of numbers between 1 and 10 spaced evenly with small increments like 0.01 or 0.001. For example, the first page would be filled with numbers 1.01, 1.02, 1.03, . , 1.99 in the left column and their decimal logarithms in the right column, while the second page with 2.00, 2.01, 2.02, 2.03, ..., 2.99, and so on till the ninth page with 9.00, 9.01, 9.02, 9.03, .
    [Show full text]
  • Package 'Distributional'
    Package ‘distributional’ February 2, 2021 Title Vectorised Probability Distributions Version 0.2.2 Description Vectorised distribution objects with tools for manipulating, visualising, and using probability distributions. Designed to allow model prediction outputs to return distributions rather than their parameters, allowing users to directly interact with predictive distributions in a data-oriented workflow. In addition to providing generic replacements for p/d/q/r functions, other useful statistics can be computed including means, variances, intervals, and highest density regions. License GPL-3 Imports vctrs (>= 0.3.0), rlang (>= 0.4.5), generics, ellipsis, stats, numDeriv, ggplot2, scales, farver, digest, utils, lifecycle Suggests testthat (>= 2.1.0), covr, mvtnorm, actuar, ggdist RdMacros lifecycle URL https://pkg.mitchelloharawild.com/distributional/, https: //github.com/mitchelloharawild/distributional BugReports https://github.com/mitchelloharawild/distributional/issues Encoding UTF-8 Language en-GB LazyData true Roxygen list(markdown = TRUE, roclets=c('rd', 'collate', 'namespace')) RoxygenNote 7.1.1 1 2 R topics documented: R topics documented: autoplot.distribution . .3 cdf..............................................4 density.distribution . .4 dist_bernoulli . .5 dist_beta . .6 dist_binomial . .7 dist_burr . .8 dist_cauchy . .9 dist_chisq . 10 dist_degenerate . 11 dist_exponential . 12 dist_f . 13 dist_gamma . 14 dist_geometric . 16 dist_gumbel . 17 dist_hypergeometric . 18 dist_inflated . 20 dist_inverse_exponential . 20 dist_inverse_gamma
    [Show full text]
  • Package 'Univrng'
    Package ‘UnivRNG’ March 5, 2021 Type Package Title Univariate Pseudo-Random Number Generation Version 1.2.3 Date 2021-03-05 Author Hakan Demirtas, Rawan Allozi, Ran Gao Maintainer Ran Gao <[email protected]> Description Pseudo-random number generation of 17 univariate distributions proposed by Demir- tas. (2005) <DOI:10.22237/jmasm/1114907220>. License GPL-2 | GPL-3 NeedsCompilation no Repository CRAN Date/Publication 2021-03-05 18:10:02 UTC R topics documented: UnivRNG-package . .2 draw.beta.alphabeta.less.than.one . .3 draw.beta.binomial . .4 draw.gamma.alpha.greater.than.one . .5 draw.gamma.alpha.less.than.one . .6 draw.inverse.gaussian . .7 draw.laplace . .8 draw.left.truncated.gamma . .8 draw.logarithmic . .9 draw.noncentral.chisquared . 10 draw.noncentral.F . 11 draw.noncentral.t . 12 draw.pareto . 12 draw.rayleigh . 13 draw.t ............................................ 14 draw.von.mises . 15 draw.weibull . 16 draw.zeta . 16 1 2 UnivRNG-package Index 18 UnivRNG-package Univariate Pseudo-Random Number Generation Description This package implements the algorithms described in Demirtas (2005) for pseudo-random number generation of 17 univariate distributions. The following distributions are available: Left Truncated Gamma, Laplace, Inverse Gaussian, Von Mises, Zeta (Zipf), Logarithmic, Beta-Binomial, Rayleigh, Pareto, Non-central t, Non-central Chi-squared, Doubly non-central F , Standard t, Weibull, Gamma with α<1, Gamma with α>1, and Beta with α<1 and β<1. For some distributions, functions that have similar capabilities exist in the base package; the functions herein should be regarded as com- plementary tools. The methodology for each random-number generation procedure varies and each distribution has its own function.
    [Show full text]
  • Hand-Book on STATISTICAL DISTRIBUTIONS for Experimentalists
    Internal Report SUF–PFY/96–01 Stockholm, 11 December 1996 1st revision, 31 October 1998 last modification 10 September 2007 Hand-book on STATISTICAL DISTRIBUTIONS for experimentalists by Christian Walck Particle Physics Group Fysikum University of Stockholm (e-mail: [email protected]) Contents 1 Introduction 1 1.1 Random Number Generation .............................. 1 2 Probability Density Functions 3 2.1 Introduction ........................................ 3 2.2 Moments ......................................... 3 2.2.1 Errors of Moments ................................ 4 2.3 Characteristic Function ................................. 4 2.4 Probability Generating Function ............................ 5 2.5 Cumulants ......................................... 6 2.6 Random Number Generation .............................. 7 2.6.1 Cumulative Technique .............................. 7 2.6.2 Accept-Reject technique ............................. 7 2.6.3 Composition Techniques ............................. 8 2.7 Multivariate Distributions ................................ 9 2.7.1 Multivariate Moments .............................. 9 2.7.2 Errors of Bivariate Moments .......................... 9 2.7.3 Joint Characteristic Function .......................... 10 2.7.4 Random Number Generation .......................... 11 3 Bernoulli Distribution 12 3.1 Introduction ........................................ 12 3.2 Relation to Other Distributions ............................. 12 4 Beta distribution 13 4.1 Introduction .......................................
    [Show full text]
  • A Class of Generalized Beta Distributions, Pareto Power Series and Weibull Power Series
    A CLASS OF GENERALIZED BETA DISTRIBUTIONS, PARETO POWER SERIES AND WEIBULL POWER SERIES ALICE LEMOS DE MORAIS Primary advisor: Prof. Audrey Helen M. A. Cysneiros Secondary advisor: Prof. Gauss Moutinho Cordeiro Concentration area: Probability Disserta¸c˜ao submetida como requerimento parcial para obten¸c˜aodo grau de Mestre em Estat´ısticapela Universidade Federal de Pernambuco. Recife, fevereiro de 2009 Morais, Alice Lemos de A class of generalized beta distributions, Pareto power series and Weibull power series / Alice Lemos de Morais - Recife : O Autor, 2009. 103 folhas : il., fig., tab. Dissertação (mestrado) – Universidade Federal de Pernambuco. CCEN. Estatística, 2009. Inclui bibliografia e apêndice. 1. Probabilidade. I. Título. 519.2 CDD (22.ed.) MEI2009-034 Agradecimentos A` minha m˜ae,Marcia, e ao meu pai, Marcos, por serem meus melhores amigos. Agrade¸co pelo apoio `aminha vinda para Recife e pelo apoio financeiro. Agrade¸coaos meus irm˜aos, Daniel, Gabriel e Isadora, bem como a meus primos, Danilo e Vanessa, por divertirem minhas f´erias. A` minha tia Gilma pelo carinho e por pedir `asua amiga, Mirlana, que me acolhesse nos meus primeiros dias no Recife. A` Mirlana por amenizar minha mudan¸cade ambiente fazendo-me sentir em casa durante meus primeiros dias nesta cidade. Agrade¸comuito a ela e a toda sua fam´ılia. A` toda minha fam´ılia e amigos pela despedida de Belo Horizonte e `aS˜aozinha por preparar meu prato predileto, frango ao molho pardo com angu. A` vov´oLuzia e ao vovˆoTunico por toda amizade, carinho e preocupa¸c˜ao, por todo apoio nos meus dias mais dif´ıceis em Recife e pelo apoio financeiro nos momentos que mais precisei.
    [Show full text]
  • Concentration Inequalities from Likelihood Ratio Method
    Concentration Inequalities from Likelihood Ratio Method ∗ Xinjia Chen September 2014 Abstract We explore the applications of our previously established likelihood-ratio method for deriving con- centration inequalities for a wide variety of univariate and multivariate distributions. New concentration inequalities for various distributions are developed without the idea of minimizing moment generating functions. Contents 1 Introduction 3 2 Likelihood Ratio Method 4 2.1 GeneralPrinciple.................................... ...... 4 2.2 Construction of Parameterized Distributions . ............. 5 2.2.1 WeightFunction .................................... .. 5 2.2.2 ParameterRestriction .............................. ..... 6 3 Concentration Inequalities for Univariate Distributions 7 3.1 BetaDistribution.................................... ...... 7 3.2 Beta Negative Binomial Distribution . ........ 7 3.3 Beta-Prime Distribution . ....... 8 arXiv:1409.6276v1 [math.ST] 1 Sep 2014 3.4 BorelDistribution ................................... ...... 8 3.5 ConsulDistribution .................................. ...... 8 3.6 GeetaDistribution ................................... ...... 9 3.7 GumbelDistribution.................................. ...... 9 3.8 InverseGammaDistribution. ........ 9 3.9 Inverse Gaussian Distribution . ......... 10 3.10 Lagrangian Logarithmic Distribution . .......... 10 3.11 Lagrangian Negative Binomial Distribution . .......... 10 3.12 Laplace Distribution . ....... 11 ∗The author is afflicted with the Department of Electrical
    [Show full text]
  • Ebookdistributions.Pdf
    DOWNLOAD YOUR FREE MODELRISK TRIAL Adapted from Risk Analysis: a quantitative guide by David Vose. Published by John Wiley and Sons (2008). All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except under the terms of the Copyright, Designs and Patents Act 1988 or under the terms of a licence issued by the Copyright Licensing Agency Ltd, 90 Tottenham Court Road, London W1T 4LP, UK, without the permission in writing of the Publisher. If you notice any errors or omissions, please contact [email protected] Referencing this document Please use the following reference, following the Harvard system of referencing: Van Hauwermeiren M, Vose D and Vanden Bossche S (2012). A Compendium of Distributions (second edition). [ebook]. Vose Software, Ghent, Belgium. Available from www.vosesoftware.com . Accessed dd/mm/yy. © Vose Software BVBA (2012) www.vosesoftware.com Updated 17 January, 2012. Page 2 Table of Contents Introduction .................................................................................................................... 7 DISCRETE AND CONTINUOUS DISTRIBUTIONS.......................................................................... 7 Discrete Distributions .............................................................................................. 7 Continuous Distributions ........................................................................................
    [Show full text]
  • Compendium of Common Probability Distributions
    Compendium of Common Probability Distributions Michael P. McLaughlin December, 2016 Compendium of Common Probability Distributions Second Edition, v2.7 Copyright c 2016 by Michael P. McLaughlin. All rights reserved. Second printing, with corrections. DISCLAIMER: This document is intended solely as a synopsis for the convenience of users. It makes no guarantee or warranty of any kind that its contents are appropriate for any purpose. All such decisions are at the discretion of the user. Any brand names and product names included in this work are trademarks, registered trademarks, or trade names of their respective holders. This document created using LATEX and TeXShop. Figures produced using Mathematica. Contents Preface iii Legend v I Continuous: Symmetric1 II Continuous: Skewed 21 III Continuous: Mixtures 77 IV Discrete: Standard 101 V Discrete: Mixtures 115 ii Preface This Compendium is part of the documentation for the software package, Regress+. The latter is a utility intended for univariate mathematical modeling and addresses both deteministic models (equations) as well as stochastic models (distributions). Its focus is on the modeling of empirical data so the models it contains are fully-parametrized variants of commonly used formulas. This document describes the distributions available in Regress+ (v2.7). Most of these are well known but some are not described explicitly in the literature. This Compendium supplies the formulas and parametrization as utilized in the software plus additional formulas, notes, etc. plus one or more
    [Show full text]
  • A Generalization of the Exponential-Logarithmic Distribution for Reliability and Life Data Analysis
    A Generalization of the Exponential-Logarithmic Distribution for Reliability and Life Data Analysis Mohieddine Rahmouni,∗ Ayman Orabiy January 5,2018 Abstract In this paper, we introduce a new two-parameter lifetime distribution, called the exponential-generalized truncated logarithmic (EGTL) distribution, by compounding the exponential and generalized truncated logarithmic distributions. Our procedure general- izes the exponential-logarithmic (EL) distribution modelling the reliability of systems by the use of first-order concepts, where the minimum lifetime is considered (Tahmasbi & Rezaei 2008). In our approach, we assume that a system fails if a given number k of the components fails and then, we consider the kth-smallest value of lifetime instead of the minimum lifetime. The reliability and failure rate functions as well as their properties are presented for some special cases. The estimation of the parameters is attained by the maximum likelihood, the expectation maximization algorithm, the method of moments and the Bayesian approach, with a simulation study performed to illustrate the different methods of estimation. The application study is illustrated based on two real data sets used in many applications of reliability. Keywords: Lifetime distributions, reliability, failure rate, order statistics, exponential distribution, truncated logarithmic distribution. 1 Introduction arXiv:1803.01156v1 [math.ST] 3 Mar 2018 Lifetime distributions are often used in reliability theory and survival analysis for modelling real data. They play a fundamental role in reliability in diverse disciplines such as finance, manufacture, biological sciences, physics and engineering. The exponential distribution is a ba- sic model in reliability theory and survival analysis. It is often used to model system reliability at a component level, assuming the failure rate is constant (Balakrishnan & Basu 1995, Barlow & Proschan 1975, Sinha & Kale 1980).
    [Show full text]
  • Stat::Fit® 3 Manual
    Stat::Fit® Statistically Fit® Software Copyright © 2016 Geer Mountain Software Corp. Table of Contents Getting Started . 3 Trees And Notes . 6 Fitting Distributions . 8 Autofit . 9 Manual Setup . 11 Moment Estimates . 13 Maximum Likelihood . 14 Goodness Of Fit . 16 Chi Squared Test . 18 Kolmogorov Smirnov Test. 2 0 Anderson Darling Test . 22 Empirical . 24 No Data . 25 Unbounded . 26 Bounded Above a Minimum . 28 Bounded Between a Minimum and a Maximum . 31 Distributions Continuous Beta . 37 Cauchy . 39 Chi Squared . 40 Erlang . 42 Exponential. 4 4 Extreme Value IA. 4 5 Extreme Value IB. 4 6 Gamma . 47 Inverse Gaussian . 49 Inverse Weibull. 5 0 Johnson SB . ..
    [Show full text]