A Statistical Derivation of the Significant-Digit
Total Page:16
File Type:pdf, Size:1020Kb
THE SIGNIFICANT-DIGIT LAW 355 A perhapssurprising corollary of the general law the constants such as speed of light and force of (4) is that (CL(cf.Hill, 1995b) gravity listed on the inside cover of an introducintroduc- tory physics textbook), about 30% have leading the significant digits are dependent significant digit 1. Becker (1982) observed that the decimal parts of failure (hazard) rates often have a logarithmic distribution, and Buck, Merchant and and not independent as one might expect. and not independentas one mightexpect. Perez (1993), in studying the values of the 477 rara- From (2) it follows that the (unconditional) prob From(2) it followsthat the (unconditional)prob- dioactive half-lives of unhindereda decays which ability that the second digit is 2 is ~ 0.109, but by abilitythat the seconddigit is 2 is 0.109, but by have been accumulated throughout the present (4) the (conditional) probability that the second digit (4) the(conditional) probability that the seconddigit century and which vary over many orders of magmag- is 2, given that the first digit is 1, is ~_ 0.115. This de is 2,given thatthe first digit is 1, is 0.115. This de- nitude, found that the frequency of occurrence of pendence among significant digits decreases rapidly pendenceamong significant digits decreases rapidly the first digits of both measured and calculated valval- as the distance between the digits increases, and it as the distancebetween the digitsincreases, and it ues of the half-lives is in "good agreement" with follows easily from the general law (4) that the dis followseasily fromthe generallaw (4) that the dis- Benford's law. tribution ofthe nth significant digit approaches the tributionof the nthsignificant digit approaches the In scientific calculations the assumption of loglog- uniform distribution on {O, 1, ... ,9} exponentially uniformdistribution on {0, 1, .. , 9} exponentially arithmically distributed mantissae "is widely used fast as n ~-? 00. This article will concentrate on deci fast as n oo. This article will concentrateon deci- and well established" (Feldstein and Turner, 1986, mal (base-l0) representations and significant digits; mal (base-10)representations and significantdigits; page 241), and as early as a quarter-century ago, the corresponding analog of(3) for other bases b > 1 the correspondinganalog of(3) forother bases b > 1 Hamming (1970, page 1609) called the appearance is simply Prob (mantissa (base b) :::< tlb) = 10gb t for is simplyProb (mantissa(base b) t/b) logbt for of the logarithmic distribution in floating-point all t E [1, b).] all t E [1, b).] numbers "well-known." Benford-like input is often a common assumption for extensive numerical calcal- culations (Knuth, 1969), but Benford-like output EMPIRICAL EVIDENCE is also observed even when the input has random Of course, many tables of numerical data do (non-Benford) distributions. Adhikari and Sarkar not follow this logarithmic distribution-lists of (1968) observed experimentally "that when ranran- telephone numbers in a given region typically be-be dom numbers or their reciprocals are raised to gin with the same few digits-and even "neutral" higherand higher powers, they have log distribudistribu- data such as square-root tables of integers are tion of most significant digit in the limit." Schatte not good fits. However, a surprisingly diverse colcol- (1988, page 443) reports that "In the course of lection of empirical data does seem to obey the a sufficiently long computation in floating-point significant-digit law. arithmetic, the occurring mantissas have nearly Newcomb(1881) noticed "how much faster the logarithmic distribution." first pages [of logarithmic tables] wear out than Extensive evidence of the significant-digit law the last ones," and after several short heuristics, has also surfaced in accounting data. Varian (1972) concluded the equiprobable-mantissae law. Some studied land usage in 777 tracts in the San FranFran- 57 years later the physicistFrank Benford redisredis- cisco Bay area and concluded ''As"As can be seen, coveredthe law and supported it with over 20,000 boththe input data and the forecasts are in fairly entriesfrom 20 different tables including such didi- good accord with Benford's Law." Nigrini and Wood verse data as areas of 335 rivers, specific heats of (1995) show that the 1990 census populations of 1389 chemical compounds, American League base-base .the 3141 counties in the United States "follow ball statistics and numbers gleaned from Reader's Benford's Law very closely," and Nigrini (1996) calcal- Digest articles and front pages of,newspapers. newspapers.AlAl- culated that the digital frequencies of income tax though Diaconis and Freedman (1979, page 363) data reported to the Internal Revenue Service of offer/convincingofferconvincing evidence that Benford manipulated interest received and interest paid is an extremely round-off errors.errors to obtain a betterfit to the logaloga- good fit to Benford. Ley (1995) found "that the sese- rithmic law, even the unmanipulateddata are a ries ofone-day returns on the Dow-Jones Industrial remarkably good fit. Newcomb's article having been AverageIndex (DJIA) and the Standard and Poor's overlooked, the law also becameknown as Benford's Index (S&P) reasonably agrees with Benford's law." law. All these statistics aside, the author also highly Since Benford's popularization of the law, an recommendsthat the justifiablyskeptical reader abundance of additional empirical evidence has performa simple experiment, such as randomly appeared. In physics,for example, Knuth (1969) selecting numerical data from front pages of sevsev- and Burke and Kincanon (1991) observed that of eral local newspapers, "or a Farmer's Almanack" as the most commonly used physicalconstants (e.g., Knuth(1969) suggests. 356 T. P. HILL CLASSICAL EXPLANATIONS One popular assumption in this context has been that of scale invariance, which corresponds to the Since the empirical significant-digit law (4) does intuitively attractive idea that any universal law not specify a well-defined statistical experiment or should be independent of units (e.g., metric or EnEn- sample space, most attempts to prove the law have glish). The problem here, however, as Knuth (1969) been purely mathematical (deterministic) in nature, observed, is that there is no scale-invariant Borel attempting to show that the law "is a built-in charchar- probability measure on the positive reals since then acteristic of our number system," as Weaver (1963) the probabilityof the set (0, 1) would equal that of called it. The idea was to prove first that the set (0, s) for all s, which again would contradict countcount- of real numbers satisfies (4), and then suggest that able additivity. (Raimi, 1976, has a good review of this explains the empirical statistical evidence. many of these arguments.) Just as with the denden- A common starting point has been to try to estabestab- sity proofs for the integers, none of these methods lish (4) for the positive integers NJ,N, beginning with yielded either a true probabilisticlaw or any statisstatis- the prototypicalset {D{D1 = I}1} = {I,{1, 10, 11, 12, 13, 1 tical insights. 14, ..... .,, 19, 100, 101, ..... .},}, the set of positive integers Attempts to prove the law based on various with leading significant digit 1. The source of diffidiffi- urn schemes for picking significant digits at ranran- culty and much of the fascination of the problem dom have been equally unsuccessful in general, is that this set {D{D1 = I}1} does not have a natural 1 although in some restricted settings log-limit laws density among the integers, that is, have been established. Adhikari and Sarkar (1968) proved that powers of a uniform(0,1)(0, 1) random varivari- 1 lim .!-{D1{D 1 = l}1}fn{1,2,...,n}n {l, 2, ... , n} able satisfy Benford's law in the limit, Cohen and noon~oo n n Katz (1984) showed that a prime chosen at ranran- dom with respect to the zeta distribution satisfies does not exist, unlike the sets of even integers or does not exist, unlike the sets of even integers or the logarithmic significant-digit law and Schatte primes which have natural densities 1/2 and 0, re primes which have natural densities 1/2 and 0, re- (1988) established convergence to Benford's law spectively. It is easy to see that the empirical density spectively.It is easy to see that the empirical density for sums and products of certain nonlattice i.i.d. of {D{D11 = I}1} oscillates repeatedly between 1/9 and variables. 5/9, and thus it is theoretically possible to assign any number in [1/9, 5/9] as the "probability" of this set. Flehinger (1966) used a reiterated-averaging technique to define a generalized density which asas- THE NATURAL PROBABILITY SPACE signs the "correct" Benford value log10log1o2 to {D{D11 = 1},I}, Cohen (1976) showed that "any generalization The task of putting the significant-digit law into of natural density which applies to the [significant a proper countably additive probabilityframework digit sets] and which satisfies one additional concon- is actually rather easy. Since the conclusion of the dition must assign the value loglog1o10 2 to [{[{D1D 1 = I}]"1}] law (4) is simply a statement about the significantsignificant- and Jech (1992) found necessary and sufficient concon- digit functions (random variables) D1,D 1 , D2,D 2 , •••.. .,, let ditions for a finitely-additive set function to be the the sample space be ~+,OR, the set of positive reals, log function. None of these solutions, however, rere- and let the sigma algebra of events simply be the sulted in a true (countably additive) probability,the IT-fieldo-field generated by {D{D1,1 , D2,D 2 , •.•.