![An Extended Benford Analysis of Exoplanet Orbital Periods](https://data.docslib.org/img/3a60ab92a6e30910dab9bd827208bcff-1.webp)
Page 1 of 12 An extended Benford analysis of exoplanet orbital periods PATRICK VANOUPLINES 1,* 1University Library – Vrije Universiteit Brussel, 1050 Brussels, Belgium. *Corresponding author. E-mail: [email protected] Abstract. Shukla, Pandey and Pathak (2017) report about their findings of a Benford analysis applied to the physical properties of exoplanets. The present paper gives a short literature overview of previous research on exoplanets. We describe the methods to perform an extended Benford analysis, which considers, both the first digit, and also other digits and digit combinations. Methods for testing conformity with the Benford distribution are discussed and applied to the digits of orbital data of exoplanets. A first result of this research is that the used data pass most of the tests. It is observed that for most tests on the orbital period values, the almost 4,000 presently known and confirmed exoplanets seem to be sufficient. The analysis of the first, second and third digits (and combinations of these digits) shows a good agreement with the Benford distribution. The analysis of the last two digits indicates that the last significant zero gets lost easily during the export from the exoplanet database. The summation analysis isolates exoplanets with extremely long orbital periods. Keywords. Benford’s law – Benford analysis – exoplanets – orbital period. 1. Introduction astronomer Simon Newcomb noticed that the first pages of books containing logarithm tables where Only a small number of publications discuss the more worn than the other pages. Newcomb (1881) use of Benford’s law in astronomy. The paper by proposed a law expressing the relation between Shukla, Pandey and Pathak (2017) on the the frequency of numbers starting with a smaller evaluation of the validity of Benford’s law digit and numbers starting with a higher digit: he regarding properties of extrasolar planets was the stated that the probability (as a fraction) of a inspiration for a more in-depth study on which we number starting with digit d (from 1 to 9, the report here. We refer to the previous paper as leading zero not being significant) is given by log(d SPP2017. + 1) – log(d). He went even further in his paper: he 1.1 Benford’s law and the Benford distribution also tabulated the probabilities of the second digit (from 0 to 9). About the third and fourth digits he Benford’s law is not new in scientific literature. writes: “In the case of the third figure the Already at the end of the nineteenth century, the probability will be nearly the same for each digit, Page 2 of 12 and for the fourth and following ones the analyses were introduced in the field of financial difference will be inappreciable”. The last sentence auditing, see Nigrini (2012). We call this group of of Newcomb’s paper is “It is curious to remark that analyses the extended Benford analysis, while we this law would enable us to decide whether a large realize that the analysis of the last two digits and collection of independent numerical results were the summation analysis are, in fact, not a Benford composed of natural numbers or logarithms”. analysis. Newcomb’s paper remained unnoticed for a few 1.2 Previous digit analysis of exoplanet data decades. Frank Benford 1938), apparently unaware The authors of SPP2017 state in their conclusion: of Newcomb’s publication, rediscovered the “The validity of Benford’s law is investigated for phenomenon (again inspired by the wear and tear the first time for exoplanets.”. This is not entirely of logarithm tables) and tested data sets such as true. Sambridge, Tkalčić, and Jackson (2010) the surface areas of rivers, the sizes of US performed a Benford analysis on the masses of populations, physical constants, molecular weights, exoplanets, among other physical data sets. Their mathematical constants, numbers contained in an data set contained only 401 exoplanets, at the time issue of Reader's Digest, street addresses of of their investigation. It is interesting to note that persons listed in American Men of Science and in the exoplanet mass data, as the authors describe death rates, in total covering more than 20,000 it, a “bump” occurs. There is an excess of values observations. Benford also concludes that “The where the first digit is 6 (9.5%, where the Benford frequency of first digits thus follows closely the proportion is 6.7%). The authors state that this logarithmic relation”. When Benford considers the “difference is subject to both sampling and other digits, he writes that also the previous digits observational error but would correspond to an should be taken in account. He arrives at formulas excess of 11 planets being erroneously assigned a describing the distributions of the digits, similar to mass with first digit 6.”. Hair (2014) reports about the equations in the present paper. For many data his study of exoplanet masses, with 758 confirmed sets, often referred to as ‘natural data’, the exoplanets and 3455 Kepler candidate exoplanets proportion of the smaller digits is much bigger than (contained in the data set in November 2013). the proportion of the larger digits. There remains a bump in the exoplanet mass data, Counterintuitively, following the Benford although smaller (8.6%) for the first digit 6. distribution, almost fifty per cent of the numbers Kossovsky (2012) describes his investigation on a start with digits 1 and 2. In the present paper we dataset of early September 2012, with 834 use a consequent notation, not only for the exoplanets, and he discusses the same subject individual digits, but also for combinations of later in somewhat more detail (Kossovsky, 2015, p digits. 34-35, and p 132). This author gives Benford A Benford analysis looks at the individual digits in analyses for exoplanets’ mass, angular distance, numbers. Take, for example, the number 65.4321: semi-major axis size, orbital eccentricity, and orbital period. His results are summarized in Figure • 6 is the first digit, 1. • 5 is the second digit, • 4 is the third digit, Benford first digit analysis of exoplanet data 35% • and so on. 30% Considering multiple digits: 25% • the first two digits are 65, 20% • the second two digits are 54, 15% • the first three digits are 654, 10% • the last two digits are 21. 5% The number 654,321 is, for a Benford analysis, 0% equivalent with 6.54321 and also with 0.0654321. 123456789 Benford Planet's mass Angular distance Semi-major axis size Orbital eccentricity Orbital period Besides, we also consider the last two digits, and Figure 1. Graphical representation of Kossovsky’s we perform a summation analysis. These two Benford first digit analysis of data about the mass, Page 3 of 12 angular distance, semi-major axis, orbital The form of the equations changes when the eccentricity, and orbital period of 834 exoplanets probabilities are calculated for digits that do not (after Kossovsky 2015, p 35). include the first digit (i.e. when the first digit can take any value 1..9). The second digit (SD) 2. Extended Benford analysis methods probabilities are calculated with The authors of SPP2017 write (p 7) that they 푃푟표푏푎푏푖푙푖푡푦 [(2푛푑 푑푖푔푖푡 = 푘 | 푎푛푦 1푠푡 푑푖푔푖푡)] investigate whether the second most significant digit (for orbital periods of extrasolar planets) also = log(1+1⁄퐷푘 ) , 푠푢푚푚푒푑 follows Benford’s law, but what the authors mean, in fact, corresponds to the first two digits (FTD). 표푣푒푟 푎푙푙 퐷 ∈ {1,2,3,4,5,6,7,8,9} (4) 2.1 Requirements for Benford analyses Similar, but slightly more complicated is the calculation of the probability of the third digit In order to perform a reliable Benford analysis, (while the first and the second digit can take any data should at least: span several magnitudes, value 1..9). The third digit (3D) probabilities are consist of thousands of values, not contain fixed calculated with nor rounded values. Further descriptions of the requirements for a potential Benford compliant 푃푟표푏푎푏푖푙푖푡푦 [3푟푑 푑푖푔푖푡 data set can be found in Nigrini (2012, p 21-23) and = 푚 | 푎푛푦 1푠푡, 푎푛푦 2푛푑 푑푖푔푖푡] Miller (2015, p 193). = log(1+1⁄퐷퐾푚 ) , 푠푢푚푚푒푑 2.2 Types of digit analyses In this section an overview is given of the Benford 표푣푒푟 푎푙푙 퐷 ∈ {1,2,3,4,5,6,7,8,9} 푎푛푑 probabilities for the most significant digits, and the 퐾 ∈ {0,1,2,3,4,5,6,7,8,9} (5) last two digits. Besides these probabilities, also the summation analysis is described; this analysis is Obviously we may also calculate the probabilities often used in accounting and fraud detection. of the second and the third digits (while the first digit can take any value 1..9). Note that calculation To express the Benford probabilities we use the of the probability of this combination of digits is notation of Kossovsky (2015). In this notation, the not mentioned by Kossovsky (2015). In this paper it probability of the first digit (FD) d is calculated by is probably the first time that this higher order 푃푟표푏푎푏푖푙푖푡푦 [1푠푡 푑푖푔푖푡 푖푠 푑] = Benford probability is tested on natural data. The second two digits (S2D) probabilities are calculated 퐿표푔(1+1⁄푑 ) (1) with: For the first two digits (FTD) and the first three 푃푟표푏푎푏푖푙푖푡푦 [2푛푑 푑푖푔푖푡 = 푝 퐴푁퐷 3푟푑 푑푖푔푖푡 digits (F3D) the equations are similar: = 푞 ⌊ 푎푛푦 1푠푡 푑푖푔푖푡] 1푠푡 푑푖푔푖푡 푖푠 푝 푃푟표푏푎푏푖푙푖푡푦 = = log (1+1⁄퐷푝푞 ) 푠푢푚푚푒푑 퐴푁퐷 2푛푑 푑푖푔푖푡 푖푠 푞 { } 퐿표푔(1+1⁄푝푞 ) (2) 표푣푒푟 푎푙푙 퐷 ∈ 1,2,3,4,5,6,7,8,9 (6) and Calculating even higher order probabilities is almost not relevant because such a higher-order 1푠푡 푑푖푔푖푡 푖푠 푝 distribution is almost flat and all probabilities are 푃푟표푏푎푏푖푙푖푡푦 퐴푁퐷 2푛푑 푑푖푔푖푡 푖푠 푞 = too close to an equal distribution over all bins of 퐴푁퐷 3푟푑 푑푖푔푖푡 푖푠 푟 digits.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages12 Page
-
File Size-