<<

Acta Astronautica 115 (2015) 277–285

Contents lists available at ScienceDirect

Acta Astronautica

journal homepage: www.elsevier.com/locate/actaastro

Statistical Drake–Seager Equation for and SETI searches$

Claudio Maccone a,b,c,n a International Academy of Astronautics (IAA), Via Martorelli 43, Torino (Turin) 10155, Italy b SETI Permanent Committee of the IAA, Via Martorelli 43, Torino (Turin) 10155, Italy c IASF-INAF Associate, Milan, Italy article info abstract

Article history: In 2013, MIT astrophysicist introduced what is now called the Seager Received 14 November 2014 Equation (Refs. [20,21]): it expresses the number N of with detectable signs Received in revised form of as the product of six factors: Ns¼the number of observed, fQ¼the fraction of 29 April 2015 stars that are quiet, fHZ¼the fraction of stars with rocky in the Habitable Zone, Accepted 1 May 2015 fO¼the fraction of those planets that can be observed, fL¼the fraction that have life, Available online 22 May 2015 fS¼the fraction on which life produces a detectable signature gas. This we call the Keywords: “classical Seager equation”. Statistical Now suppose that each input of that equation is a positive random variable, rather Statistical Seager Equation than a sheer positive number. As such, each input random variable has a positive mean Lognormal probability densities value and a positive variance that we assume to be numerically known by scientists. This we call the “Statistical Seager Equation”. Taking the logs of both sides of the Statistical Seager Equation, the latter is converted into an equation of the type log(N)¼SUM of independent random variables. Let us now consider the possibility that, in the future, the number of physical inputs considered by Seager when she proposed her equation will actually increase, since scientists will know more and more details about the of exoplanets. In the limit for an infinite number of inputs, i.e. an infinite number of independent input random variables, the Central Limit Theorem (CLT) of Statistics applies to the Statistical Seager Equation. Thus, the probability density function (pdf) of the output random variable log(N) will approach a Gaussian (normal) distribution in the limit, whatever the distribution of the input random variables might possibly be. But if log(N) approaches the normal distribution, then N approaches the lognormal distribution, whose mean value is the sum of the input mean values and whose variance is the sum of the input variances. This is just what this author realized back in 2008 when he transformed the Classical Drake Equation into the Statistical Drake Equation (Refs. [10,11]). This discovery led to much more related work in the following years (Refs. [12–19]). In this paper we study the lognormal properties of the Statistical Seager Equation relating them to the present and future knowledge for exoplanets searches from both the ground and space. & 2015 IAA. Published by Elsevier Ltd. All rights reserved.

☆ This paper was presented during the 65th IAC in . n Corresponding author at: SETI Permanent Committee of the IAA, Via Martorelli 43, Torino (Turin) 10155, Italy. E-mail addresses: [email protected], [email protected] URL: http://www.maccone.com/ http://dx.doi.org/10.1016/j.actaastro.2015.05.005 0094-5765/& 2015 IAA. Published by Elsevier Ltd. All rights reserved. 278 C. Maccone / Acta Astronautica 115 (2015) 277–285

1. Introduction 7) fL is the fraction of galactic alive at the time when we, poor humans, attempt to pick up their radio As we stated in the Abstract, the Seager equation is signals (that they throw out into space just as we have mathematically equivalent to the Drake equation well done since 1900, when Marconi started the transatlan- known in SETI, the Search for ExtraTerrestrial . tic transmissions). In other words, fL is the number of Actually, all equations simply made up by an output equal civilizations now transmitting and receiving, and this to the multiplication of some independent inputs are implies an estimate of “how long will a technological equivalent to the Drake equation. For instance, the Dole live?” that nobody can make at the equation that applies to the number of habitable planets for moment. Also, are they going to destroy themselves man in the , was studied by this author in Ref. [13] in a nuclear war, and thus live only a few decades of and in Chapter 3 of Ref. [15] exactly in the same mathe- technological civilization? Or are they slowly becoming matical way the Drake equation was studied earlier by him wiser, reject war, speak a single language (like English in Refs. [10,11], and the Seager equation is studied in this today), and merge into a single “nation”; thus living in paper. More prosaically, all these equations simply are the peace for ages? Or will robots take over one day making Law of Compound Probability that everyone learns about in “flesh animals” disappear forever (the so-called “post- every course on elementary probability . Sara Seager biological universe”)? herself modestly called her equation “an extended Drake equation”. Therefore, we prefer to describe the following important transition from the classical equation to the No one knows… statistical one with the language of SETI, and so we now But let us go back to the Drake equation (1). introduce first the classical, and later the Statistical Drake Inthefiftyyearsofitsexistence,anumberofsuggestions equation. have been put forward about the different numeric values of its seven factors. Of course, every different set of these seven 2. The classical Drake equation (1961) input numbers yields a different value for N,andwecan endlessly play that way. But we claim that these are like… The Drake equation is a now famous result (see Ref. [1] children plays! for the Wikipedia summary) in the fields of SETI (the Search for ExtraTerrestial Intelligence, see Ref. [2])andAstrobiology (see Ref. [3]). Devised in 1961, the Drake equation was the 3. Transition from the classical to the Statistical Drake first scientific attempt to estimate the number N of Extra- equation Terrestrial civilizations in the Galaxy with which we might come in contact. Frank D. Drake (see Ref. [4])proposeditas We claim the classical Drake equation (1), as we shall call the product of seven factors: it from now on to distinguish it from our statistical Drake N ¼ NsUfpUneUflUfiUfcUfL ð1Þ equation to be introduced in the coming sections, well, the classical Drake equation is scientifically inadequate in one where regard at least: it just handles sheer numbers and does not associate an error bar to each of its seven factors. At the very 1) Ns is the estimated number of stars in our Galaxy. least, we want to associate an error bar to each input variable 2) fp is the fraction (¼ percentage) of such stars that have appearing on the right-hand side of (1). planets. Well, we have thus reached STEP ONE in our improve- 3) ne is the number “Earth-type” such planets around the ment of the classical Drake equation: replace each sheer given ; in other words, ne is number of planets, in a number by a probability distribution! given stellar system, on which the chemical conditions exist for life to begin its course: they are “ready for life”. 4) fl is fraction (¼ percentage) of such “ready for life” planets on which life actually starts and grows up (but 4. Step 1: letting each factor become a random variable not yet to the “intelligence” level). 5) fi is the fraction (¼ percentage) of such “planets with In this paper we adopt the notations of the great book life forms” that actually evolve until some form of “Probability, Random Variables and Stochastic Processes” “intelligent civilization” emerges (like the first, historic by Athanasios Papoulis (1921–2002), now re-published as human civilizations on Earth). Papoulis-Pillai, Ref. [5]. The advantage of this notation is 6) fc is the fraction (¼ percentage) of such “planets with that it makes a neat distinction between probabilistic (or civilizations” where the civilizations evolve to the point statistical: it is the same thing here) variables, always of being able to communicate across the interstellar denoted by capitals, from non-probabilistic (or “determi- distances with other (at least) similarly evolved civili- nistic”) variables, always denoted by lower-case letters. zations. As far as we know in 2015, this means that they Adopting the Papoulis notation also is a tribute to him by must be aware of the Maxwell equations governing this author, who was a Fulbright Grantee in the United radio waves, as well as of computers and radio astron- States with him at the Polytechnic Institute (now Poly- omy (at least). technic University) of New York in the years 1977–1979. C. Maccone / Acta Astronautica 115 (2015) 277–285 279

We thus introduce seven new (positive) random vari- 6. Step 3: the transformation law of random variables ables Di (“D” from “Drake”) defined as 8 ¼ So far we did not mention at all the problem: “which > D1 Ns > probability distribution shall we attach to each of the > D ¼ fp > 2 seven (positive) random variables D ?” > ¼ i <> D3 ne It is not easy to answer this question because we do D ¼ fl ð Þ > 4 2 not have the least scientific clue to what probability > ¼ > D5 f distributions fit at best to each of the seven points listed > > ¼ in Section 2. > D6 fc : Yet, at least one trivial error must be avoided: claiming D7 ¼ fL that each of those seven random variables must have a Gaussian (i.e. normal) distribution. In fact, the Gaussian so that our Statistical Drake equation may be simply distribution, having the well-known bell-shaped probabil- rewritten as ity density function

7 1 ðÞx μ 2 N ¼ ∏ D : ð3Þ ðÞ¼; μ; σ pffiffiffiffiffiffi 2 σ2 ðÞσ Z ðÞ i f X x e 0 7 i ¼ 1 2π σ has its independent variable x ranging between 1 and Of course, N now becomes a (positive) random variable 1 and so it can apply to a real random variable X only, and too, having its own (positive) mean value and standard never to positive random variables like those in the sta- deviation. Just as each of the Di has its own (positive) tistical Drake equation (3). mean value and standard deviation… Searching again for probability density functions that … the natural question then arises: how are the seven represent positive random variables, an obvious choice mean values on the right related to the mean value on would be the gamma distributions (see, for instance, the left? Ref. [6]). However, we discarded this choice too because … and how are the seven standard deviations on the of a different reason: keep in mind that, according to (5), right related to the standard deviation on the left? once we selected a particular type of probability density Just take the next step… function (pdf) for the last seven Di of Eq. (5), then we must compute the (new and different) pdf of the logs of such 5. Step 2: introducing logs to change the product into a sum random variables. And the pdf of these logs certainly is not gamma-type any more. Products of random variables are not easy to handle in It is high time now to remind the reader of an important probability theory. It is actually much easier to handle theorem that is proved in probability courses, but, unfortu- sums of random variables, rather than products, because: nately, does not seem to have a specific name. It is the transformation law (so we shall call it, see, for instance, 1) The probability density of the sum of two or more ind- Ref. [5],pages130–131) allowing us to compute the pdf of ependent random variables is the convolution of the rele- a certain new random variable Y that is a known function vant probability densities (worry not about the equations, Y ¼ gXðÞof another random variable X having a known pdf. ðÞ right now). In other words, if the pdf f X x of a certain random variable X ðÞ 2) The Fourier transform of the convolution simply is the is known, then the pdf f Y y of the new random variable Y, product of the Fourier transforms (again, worry not related to X by the functional relationship about the equations, at this point). Y ¼ gXðÞ ð8Þ

So, let us take the natural logs of both sides of the can be calculated according to the following rule: Statistical Drake equation (3) and change it into a sum: ! 1) First invert the corresponding non-probabilistic equa- 7 X7 ¼ ðÞ ðÞ lnðÞ¼N ln ∏ Di ¼ lnðÞDi : ð4Þ tion y gx and denote by xi y the various real roots i ¼ 1 i ¼ 1 resulting from this inversion. 2) Second, take notice whether these real roots may be either finitely- or infinitely-many, according to the nat- It is now convenient to introduce eight new (positive) ure of the function y ¼ gxðÞ. random variables defined as follows: 3) Third, the probability density function of Y is then given ( Y ¼ lnðÞN by the (finite or infinite) sum ð5Þ Y ¼ lnðÞD i ¼ 1; :::; 7: i i X f ðÞx ðÞy f ðÞ¼y X i ð9Þ Y 0ðÞðÞ i g xi y Upon inversion, the first equation of (5) yields an ðÞ important equation that will be used in the sequel: where the summation extends to all roots xi y and g0ðÞx ðÞy is the absolute value of the first derivative of N ¼ eY : ð6Þ i gxðÞwhere the ith root xiðÞy has been replaced instead We are now ready to take Step 3. of x. 280 C. Maccone / Acta Astronautica 115 (2015) 277–285

Since we must use this transformation law to transfer From (12) and (13) we may now derive the variance of from the Di to the Yi ¼ lnðÞDi , it is clear that we need to the uniform distribution DE start from a Di pdf that is as simple as possible. The gamma σ2 ¼ uniform_D 2 uniform_D 2 pdf is not responding to this need because the analytic uniform_Di i i expression of the transformed pdf is very complicated. 2 2 2 a 2 þa b þb ðÞa þb ðÞb a Also, the gamma distribution has two free parameters in it, ¼ i i i i i i ¼ i i : ð14Þ 3 4 12 and this “complicates” its application to the various mean- ings of the Drake equation. In conclusion, we discarded the Upon taking the square root of both sides of (14),we gamma distributions and confined ourselves to the much finally obtain the standard deviation of the uniform dis- simpler and much more practical uniform distribution tribution: instead, as shown in Section 7. σ ¼ bi pffiffiffiai: ð Þ uniform_D 15 i 2 3 We now wish to perform a calculation that is mathe- 7. Step 4: assuming the easiest input distribution for each matically trivial, but rather unexpected from the intuitive Di: the uniform distribution point of view, and very important for our applications to the Statistical Drake equation. Just consider the two

Let us now suppose that each of the seven Di is distributed simultaneous Eqs. (12) and (15) Z 8 UNIFORMLY in the interval ranging from the lower limit ai þ < ai bi uniform_Di ¼ 0 to the upper limit bi Zai. 2 ð16Þ This is the same as saying that the probability density : σ ¼ bi pffiffiai: uniform_Di 2 3 function of each of the seven Drake random variables Di has the equation Upon inverting this trivial linear system, one finds 8 pffiffiffiffi < a ¼ uniform_D 3 σ 1 i i pffiffiffiffi uniform_Di ð Þ f ¼ with 0ra rxrb ð10Þ : 17 uniform_Di i i b ¼ uniform_D þ 3 σ : bi ai i i uniform_Di that follows at once from the normalization condition This is of paramount importance for our application the Z Statistical Drake equation inasmuch as it shows that: if one bi (scientifically) assigns the mean value and standard devia- f ðÞx dx ¼ 1 : ð11Þ uniform_Di ai tion of a certain Drake random variable Di, then the lower and upper limits of the relevant uniform distribution are Let us now consider the mean value of such uniform D , i given by the two Eq. (17), respectively. defined by pffiffiffi In other words, there is a factor of 3 ¼ 1:732 included Z Z bi 1 bi in the two Eq. (17) that is not obvious at all to human uniform_D ¼ xf ðÞx dx ¼ xdx i uniform_Di intuition, but must indeed be taken into account. ai bi ai ai  The application of this result to the Statistical Drake 2 bi 2 2 þ ¼ 1 x ¼ bi ai ¼ ai bi: equation is discussed in the next section. b a 2 2ðÞb a 2 i i ai i i By words (as it is intuitively obvious): the mean value of 8. Step 5: computing the logs of the 7 uniformly the uniform distribution simply is the mean of the lower distributed Drake random variables Di plus upper limit of the variable range Intuitively speaking, the natural log of a uniformly

ai þbi uniform_Di ¼ : ð12Þ distributed random variable may not be another uniformly 2 distributed random variable! This is obvious from the In order to find the variance of the uniform distribu- trivial diagram of y ¼ lnðÞx shown in Fig. 1. tion, we first need finding the second moment So, if we have a uniformly distributed random varia- DEZ ble Di with lower limit ai and upper limit bi, the random bi uniform_D 2 ¼ x2 f ðÞx dx variable i uniform_Di a i ¼ ðÞ ¼ ; :::; ð Þ Z  Yi ln Di i 1 7 18 bi 3 1 bi 1 x3 b a 3 ¼ x2 dx ¼ ¼ i i b a b a 3 3ðÞb a must have its range limited in between the lower limit ln i i ai i i ai i i  (ai) and the upper limit ln(bi). In other words, these are the ðÞ 2 þ þ 2 2 bi ai ai aibi bi a 2 þa b þb lower and upper limits of the relevant probability density ¼ ¼ i i i i : ðÞ function f Y y . But what is the actual analytic expression 3ðÞbi ai 3 i of such a pdf? To find it, we must resort to the gene- The second moment of the uniform distribution is thus ral transformation law for random variables, defined by DE Eq. (9). Here we obviously have a 2 þa b þb 2 uniform_D 2 ¼ i i i i : ð13Þ i 3 y ¼ gxðÞ¼lnðÞx : ð19Þ C. Maccone / Acta Astronautica 115 (2015) 277–285 281 hihi 2 2 Natural logarithm of x bi ln ðÞbi 2lnðÞþbi 2 ai ln ðÞai 2lnðÞþai 2 2 ¼ : bi ai ð27Þ 1

The variance of Yi¼ln(Di) is now given by (27) minus the square of (26), that, after a few reductions, yield: 0 ÂÃ ðÞ ðÞ2 2 2 aibi ln bi ln ai σ ¼ σ ðÞ¼ 1 : ð28Þ log: y=ln(x) Yi ln Di 2 ðÞbi ai 1

REAL values of the natural Whence the corresponding standard deviation vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u ÂÃ 2 u 2 012345 t aibi lnðÞbi lnðÞai σ ¼ σ ðÞ¼ 1 : ð29Þ Yi ln Di 2 POSITIVE independent variable x ðÞbi ai

Fig. 1. The simple function y ¼ lnðÞx : Let us now turn to another topic: the use of Fourier That, upon inversion, yields the single root transforms, that in probability theory, are called “charac- y teristic functions”. Following again the notations of Papou- x1ðÞy ¼ xyðÞ¼ e : ð20Þ lis (Ref. [5]) we call “characteristic function”, ΦY ðÞζ ,ofan On the other hand, differentiating (19) one gets i assigned probability distribution Yi, the Fourier transform of the relevant probability density function, that is (with 0ðÞ¼1 0ðÞ¼ðÞ 1 ¼ 1 ð Þ pffiffiffiffiffiffiffiffi g x and g x1 y y 21 ¼ x x1ðÞy e j 1) Z 1 where (20) was already used in the last step. By virtue of Φ ðÞ¼ζ jζy ðÞ : ð Þ Yi e f Y y dy 30 the uniform probability density function (10) and of (21), 1 i the general transformation law (9) finally yields X f ðÞx ðÞy 1 1 ey The use of characteristic functions simplifies things f ðÞ¼y X i ¼ U ¼ : ð22Þ Y ðÞðÞ b a 1 b a greatly. For instance, the calculation of all moments of a i g' xi y i i ey i i known pdf becomes trivial if the relevant characteristic In other words, the requested pdf of Yi is function is known, and greatly simplified also are the y proofs of important theorems of statistics, like the Central ðÞ¼ e ¼ ; :::; ðÞr r ðÞ: ð Þ f Y y i 1 7 with ln ai y ln bi 23 Limit Theorem that we will use in Section 9. Another bi ai important result is that the characteristic function of the Probability density functions of the natural logs of all the sum of a finite number of independent random variables is uniformly distributed Drake random variables Di. simply given by the product of the corresponding char- This is indeed a positive function of y over the interval acteristic functions. This is just the case we are facing in ðÞr r ðÞ ln ai y ln bi , as for every pdf, and it is easy to see that the Statistical Drake equation (3) and so we are now led to its normalization condition is fulfilled: find the characteristic function of the random variable Yi, Z ðÞ Z ðÞ ðÞ ðÞ ln bi ln bi y ln bi ln ai i.e. ðÞ ¼ e ¼ e e ¼ : ð Þ f Y y dy dy 1 24 Z 1 Z ðÞ ðÞ ðÞ b a b a ln bi y ln ai ln ai i i i i jζy jζy e ΦY ðÞ¼ζ e f ðÞy dy ¼ e dy i Yi 1 lnðÞa bi ai Next we want to find the mean value and standard Z i lnðÞb hiðÞ deviation of Y , since these play a crucial role for future 1 i 1 1 ln bi i ¼ eðÞ1 þ jζ y dy ¼ U eðÞ1 þ jζ y developments. The mean value hiY is given by þ ζ naðÞ i bi ai lnðÞai bi ai 1 j i Z ðÞ Z ðÞ ðÞþ ζ ðÞ ðÞþ ζ ðÞ 1 þ jζ 1 þ jζ ln bi ln bi U y e 1 j ln bi e 1 j ln ai b a hi¼ U ðÞ ¼ y e ¼ ¼ i : ð Þ Yi y f Y y dy dy 31 ðÞ ðÞ b a ðÞbi ai ðÞ1þjζ ðÞbi ai ðÞ1þjζ ÂÃln ai ÂÃln ai i i b lnðÞb 1 a lnðÞa 1 ¼ i i i i : ð25Þ bi ai Thus, the characteristic function of the natural log of the Drake uniform random variable D is given by This is thus the mean value of the natural log of all the i 1 þ jζ þ ζ uniformly distributed Drake random variables Di b a1 j ÂÃÂÃ Φ ðÞ¼ζ i : ð32Þ Yi ðÞb a ðÞ1þjζ bi lnðÞbi 1 ai lnðÞai 1 i i hiYi ¼ lnðÞDi ¼ : ð26Þ bi ai In order to find the variance also, we must first compute the mean value of the square of Yi, that is 9. The central limit theorem (CLT) of statistics DEZ ðÞ Z ðÞ ln bi ln bi 2 U y 2 ¼ 2 U ðÞ ¼ y e Indeed there is a good, approximating analytical Yi y f Y y dy dy lnðÞa lnðÞa bi ai ðÞ i i expression for f N y , and this is the following lognormal 282 C. Maccone / Acta Astronautica 115 (2015) 277–285 probability density function 10. The lognormal distribution is the distribution of the number N of extraterrestrial civilizations in the Galaxy 1 1 ðÞlnðÞy μ 2 ðÞ¼; μ; σ Upffiffiffiffiffiffi 2 σ2 ðÞðZ ; σ Z Þ f N y e y 0 0 33 y 2π σ The CLT may of course be extended to products of random variables upon taking the logs of both sides, just as we did in Eq. (3). It then follows that the exponent random variable, like Y in (6), tends to a normal random variable, and, as a To understand why, we must resort to what is perhaps consequence, it follows that the base random variable, like N the most beautiful theorem of Statistics: the Central Limit in (6), tends to a lognormal random variable. Theorem (abbreviated CLT). Historically, the CLT was in To understand this fact better in mathematical terms fact proven first in 1901 by the Russian mathematician consider again of the transformation law (9) of random Alexandr Lyapunov (1857–1918), and later (1920) by the variables. The question is: what is the probability density – Finnish mathematician Jarl Waldemar Lindeberg (1876 function of the random variable N in Eq. (6), that is, what is 1932) under weaker conditions. These conditions are the probability density function of the lognormal distribu- certainly fulfilled in the context of the Drake equation tion? To find it, set because of the “reality” of the , biology and x sociology involved with it, and we are not going to discuss y ¼ gxðÞ¼e : ð35Þ this point any further here. A good, synthetic description This, upon inversion, yields the single root of the Central Limit Theorem (CLT) of Statistics is found at the Wikipedia site (Ref. [7]) to which the reader is referred x1ðÞ¼y xyðÞ¼lnðÞy : ð36Þ for more details, such as the equations for the Lyapunov and the Lindeberg conditions, making the theorem “rigor- On the other hand, differentiating (35) one gets ously” valid. 0 x 0 lnðÞy Put in loose terms, the CLT states that, if one has a sum of g ðÞ¼x e and g ðÞ¼x1ðÞy e ¼ y: ð37Þ random variables even NOT identically distributed, this sum where (21) was already used in the last step. The general tends to a normal distribution when the number of terms transformation law (9) finally yields making up the sum tends to infinity. Also, the normal dist- ribution mean value is the sum of the mean values of the X f ðÞx ðÞy 1 f ðÞ¼y X i ¼ f ðÞlnðÞy : ð38Þ addend random variables, and the normal distribution var- N ðÞðÞ Y i g' xi y y iance is the sum of the variances of the addend random variables. Let us now write down the equations of the CLT in the Therefore, replacing the probability density on the right form needed to apply it to our Statistical Drake equation by virtue of the well-known normal (or Gaussian) dis- (3). The idea is to apply the CLT to the sum of random vari- tribution given by Eq. (7), the lognormal distribution of Eq. ables given by (4) and (5) whatever their probability dist- (33) is found, and the derivation of the lognormal dis- ributions can possibly be. In other words, the CLT applied to tribution from the normal distribution is proved. the Statistical Drake equation (3) leads immediately to the Table 1 summarizes all the properties of the lognormal following three equations: distribution, whose demonstrations may be found in statistical textbooks (for instance see refs. [7–9]). The last 1) The sum of the (arbitrarily distributed) independent ran- two lines in Table 1, however, are about our own discovery μ σ2 dom variables Yi makes up the new random variable Y. of Eqs. (26) and (28) yielding and of the lognormal 2) The sum of their mean values makes up the new mean distribution of the output of the statistical Drake (and value of Y. Seager) equation when all inputs are supposed to be 3) The sum of their variances makes up the new variance uniformly distributed and both the mean value and of Y. standard deviation of each input is assigned. Remember that this mean value and standard deviation may be immediately converted into the uniform distribution lower In equations and upper values thanks to the two Eq. (17). 8 > X7 > > Y ¼ Yi 11. Data enrichment principle > > i ¼ 1 > < X7 It should be noticed that any positive number of random hiY ¼ hiY ð Þ > i 34 variables in the statistical Drake equation is compatible with > i ¼ 1 > the CLT. So, our generalization allows for many more factors > X7 > > σ2 ¼ σ2 : to be added in the future as long as more refined scientific : Y Yi i ¼ 1 knowledge about each factor becomes known to scientists. This capability to make room for more future factors in the statistical Drake equation we call “Data Enrichment Princi- This completes our synthetic description of the CLT for ple”, and we regard it as the key to more profound future sums of random variables. results in the fields of ad SETI. C. Maccone / Acta Astronautica 115 (2015) 277–285 283

Table 1 Summary of the properties of the lognormal distribution that applies to the random variable N¼number of ET communicating civilizations in the Galaxy.

Random variable N¼number of communicating ET civilizations in Galaxy

Probability distribution Lognormal

ðÞlnðÞn μ 2 Probability density function 1 1 f ðÞ¼n; μ; σ U pffiffiffiffi e 2 σ2 ðÞnZ0; σ Z0 N n 2π σ μ σ2 Mean value hi¼ 2 N e e  Variance σ2 ¼ 2μ σ2 σ2 N e e e 1

2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffi μ σ σ2 Standard deviation σ ¼ 2 DEN e e e 1 2 All the moments, i.e. kth moment k kμ k2 σ N ¼ e e 2 Mode (¼abscissa of the lognormal peak) μ σ2 nmod e ¼ npeak ¼ e e 2 Value of the mode peak 1 μ σ f ðÞ¼n pffiffiffiffi Ue Ue 2 N mode 2π σ Median (¼fifty–fifty probability value for N) median ¼ eμ pffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 σ2 Skewness K3 ¼ σ þ 3 e 1 e 2 ðÞK2 2 σ2 σ2 σ2 Kurtosis K4 ¼ e4 þ2 e3 þ3 e2 6 ðÞK 2 2  Expression of μ in terms of the lower (ai) and upper (bi) limits P7 P7 ½ðÞ ½ðÞ μ ¼ hi¼ bi ln bi 1 ai ln ai 1 : Yi b a of the Drake uniform input random variables Di i i i ¼ 1 i ¼ 1  Expression of σ2 in terms of the lower (a ) and upper (b ) limits P7 P7 ½ðÞ ðÞ2 i i σ2 ¼ σ2 ¼ ai bi ln bi ln ai : 1 2 Yi ðÞb a of the Drake uniform input random variables Di i ¼ 1 i ¼ 1 i i

12. The Statistical Seager Equation 2) And what is the probability distribution of the resulting output N? We now come to consider the Statistical Seager Equa- tion. As described in the Abstract of this paper, in 2013, Our answers to these two questions are: MIT astrophysicist Sara Seager introduced what is now called the Seager equation (see Refs. [20,21]): it expresses 1) No analytical solution exists as long as the number of the number N of exoplanets with detectable signs of life as Input random variables is finite. Only numeric calcula- the product of six factors: tions may be done, but this requires writing a numeric code, that this author could not and would not write 1) Ns¼the number of stars observed, down because is he a mathematical physicist and not a 2) fQ¼the fraction of stars that are quiet, computer programmer. 3) fHZ¼the fraction of stars with rocky planets in the 2) However, if we let the number of input random Habitable Zone, variables approach infinity (i.e. if we consider a high 4) fO¼the fraction of those planets that can be observed, number of input random variables, like five to ten or 5) fL¼the fraction that have life, even more (“how many” might be discussed later) then 6) fS¼the fraction on which life produces a detectable the solution of both the statistical Drake equation (1) signature gas. and the Statistical Seager Equation (3) is immediate. In both cases the probability distribution of the output That is random variable N is a Lognormal Distribution: μ N ¼ NsUfQUfHZUfOUfLUfS ð39Þ a) Its real parameter is given by the sum of the mean values of all input random variables, and b) Its positive parameter σ2 is given by the sum of the This we call the “classical Seager equation”. variances of all input random variables, whatever their Mathematically speaking, Eq. (39) is exactly the same probability distribution might possibly be. thing as Eq. (1): only the words change, and, of course, so does its scientific meaning. This is the result of applying the Central Limit Theorem Mathematically, one may thus immediately apply to of Statistics to both the Drake and Seager equations. (39) the full string of mathematical theorems that we described in all Eqs. (2)–(32) of this paper. The first such step is clearly the transformation of the 13. The extremely important particular case when the classical Seager equation (39) into the Statistical Seager input random variables are uniform Equation (looking the same, mathematically), having all numeric inputs replaced by positive random variables. No Now we turn to the particular case of our theory when more comments are necessary. all the input random variables are uniform random vari- The second step is asking the two questions: ables. This case we regard as “extremely important” for all practical applications of both the Drake and the Seager 1) What is the probability distribution of each of the six statistical equations inasmuch as it is the only case when input positive random variables in (39)? analytic formulae do exist expressing the two lognormal 284 C. Maccone / Acta Astronautica 115 (2015) 277–285 parameters μ and σ directly in terms of the lower and This completes our summary of the discoveries that upper limits of all the uniform input random variables. this author made back in 2008 about the Statistical Drake In fact, define by Equation and so necessarily also about the 2013 Statistical Seager Equation also. Especially useful is the simple case

1) ai the real and positive number representing the when all input variables are Uniformly distributed: then, LOWER LIMIT of the range of the ith uniform input the lognormal pdf of N is given by Eqs. (42)–(44). random variable. 2) bi the real and positive number representing the 14. Conclusion UPPER LIMIT of the range of the ith uniform input random variable. The conclusion of this paper is that this author would 4 3) Clearly, it is assumed bi ai. respectfully advise Professor Sara Seager and her MIT Then Team to make use of Eqs. (42)–(44) in order to find the 4) The mean value of the ith uniform input random lognormal pdf of the number N of exoplanets with detect- variable is (obviously) given by (12), that is able signs of life, having previously estimated all the ai and a þb b numerically. Uniform_D ¼ i i ð40Þ i i 2 This research work could be part of the Phase A or B studies for the NASA planned TESS space mission, 〈 〉 5) The standard deviation of the ith uniform input ran- described at the web sites http://tess.gsfc.nasa.gov/ and 〈 dom variable is (less obviously) given by (15),thatis http://en.wikipedia.org/wiki/Transiting_Exoplanet_Survey_ Satellite〉). bi pffiffiffiai σUniform_D ¼ : ð41Þ i 2 3 Acknowledgents Above all, one has 6) The probability density function (pdf) of the output The author wishes to acknowledge the cooperation and random variable N is the lognormal pdf (33), that is: support of Professor Sara Seager and her Team about the two scientific meetings they had in 2014: 1 1 ðÞlnðÞn μ 2 ðÞ¼ Upffiffiffiffiffiffiffi 2 σ2 ðÞZ ðÞ f N n e n 0 42 n 2π σ 1) At the Istituto Nazionale di Astrofisica (INAF) in Milan, Italy, when they met on April 1st, 2014, and 7) The real and positive parameter μ appearing in the 2) At MIT, when they met on May 5th, 2014. lognormal pdf (33) is expressed directly in terms of

all the known ai and bi by Eq. (26), that is ÂÃÂÃ numberX_of_inputs b lnðÞb 1 a lnðÞa 1 Appendix μ ¼ i i i i : b a i ¼ 1 i i ð Þ 43 Proof. of Shannon's 1948 Theorem stating that the Uni- form distribution is the “most uncertain” one over any 8) The real and positive parameter σ2 appearing in the Finite range of values. lognormal pdf (33) is expressed directly in terms of As it is well known, the Shannon entropy of any all the known ai and bi by Eq. (28), that is ðÞ ÂÃ!probability density function px is given by the integral numberX_of_inputs 2 Z 1 ai bi lnðÞbi lnðÞai σ2 ¼ 1 : ð44Þ ðÞ¼ ðÞ ðÞ : ð Þ ðÞ 2 Shannon_Entropy_of_p x px log px dx 45 i ¼ 1 bi ai 1 In modern textbooks this is also called Shannon differen- 9) In addition to providing the pdf of N explicitly by virtue tial entropy. of the three Eqs. (42)–(44), this case of the all-uniform Now consider the case when a probability density func- input random variables also is “realistic” in that the tion pxðÞis limited to a finite interval arxrb.Thisis uniform probability distribution is “the most uncertain obviously the case with any physical positive random vari- one” in the sense of Shannon's Information Theory. able, such as the number N of extraterrestrial communicating This is intuitively obvious (“when you do not know civilizations in the Galaxy. We now wish to prove that for any where to go, you look around and all directions are such finite random variable the maximum entropy distribution equal to each other, i.e. your probability distribution in is the UNIFORM distribution over arxrb. the azimuth is uniform between 0 and 2 π). But it may Shannon did not bother to prove this simple theorem in also be rigorously proven by applying the method of his 1948 papers since he probably regarded it as just too theLagrangemultiplierstothedefinitionofShannon's trivial. But we prefer to point out this theorem since, in the Entropy, as shown in the Appendix to this paper. language of the statistical Drake equation, it sounds like: 10) All the statistical properties of the lognormal output “Since we don’t know what the probability distribution of

random variable N may be found analytically and are any one of the Drake random variables Di is, it is safer to listed in Table 1. assume that each of them has the maximum possible C. Maccone / Acta Astronautica 115 (2015) 277–285 285

entropy over arxrb, i.e., that Di is UNIFORMLY distrib- References uted there”. The proof of this theorem is as follows: [1] 〈http://en.wikipedia.org/wiki/Drake_equation〉. [2] 〈http://en.wikipedia.org/wiki/ 〉 1) Start by assuming a rxrb . Search_for_Extra-Terrestrial_Intelligence . i i [3] 〈http://en.wikipedia.org/wiki/Astrobiology〉. 2) Then form the linear combination of the entropy [4] 〈http://en.wikipedia.org/wiki/Frank_Drake〉. integral plus the normalization condition for Di [5] A. Papoulis, S.U. Pillai, Probability, Random Variables and Stochastic Processes, fourth ed. Tata McGraw-Hill, New Delhi, ISBN 0-07- Z bi ÂÃ 048658-1, 2002. δ pxðÞlog pxðÞþλpxðÞ dx ¼ 0 ð46Þ [6] 〈http://en.wikipedia.org/wiki/Gamma_distribution〉. 〈 〉 ai [7] http://en.wikipedia.org/wiki/Central_limit_theorem . [8] 〈http://en.wikipedia.org/wiki/Cumulant〉. where λ is a Lagrange multiplier. [9] 〈http://en.wikipedia.org/wiki/Median〉. Performing the variation, i.e. differentiating with res- [10] C. Maccone. (2008, The Statistical Drake Equation, Paper #IAC-08- pect to pxðÞ, one finds A4.1.4 presented on 1st October, 2008, at the 59th International Astronautical Congress (IAC), Glasgow, Scotland, UK, 29 September– log pxðÞ1þλ ¼ 0 ð47Þ 3 October, 2008. [11] C. Maccone, The Statistical Drake Equation, Acta Astronaut. 67 (2010) that is (2010) 1366–1383. ðÞ¼ λ 1: ð Þ [12] C. Maccone, The Statistical , J. Br. Interplanet. Soc. 63 px e 48 (2010) (2010) 222–239. Applying the normalization condition (constraint) to [13] C. Maccone, SETI and SEH (statistical equation for habitables), Acta Astronaut. 68 (2011) (2011) 63–75. the last expression for pxðÞyields [14] C. Maccone, A mathematical model for evolution and SETI, Orig. Life Z Z Z Evolut. Biosph. (OLEB) 41 (2011) 609–619. Available online 3rd bi bi bi λ 1 λ 1 λ 1 December, 2011. 1 ¼ pxðÞdx ¼ e dx ¼ e dx ¼ e ðÞbi ai ai ai ai [15] C. Maccone, 2012, Mathematical SETI, a 724-pages book published – ð49Þ by Praxis Springer in the fall of 2012. ISBN-10: 3642274366; ISBN- 13: 978-3642274367, edition: 2012. that is [16] C. Maccone, SETI, evolution and human history merged into a mathematical model Available online since April 23, 2013, Int. J. 1 – eλ 1 ¼ ð50Þ Astrobiol. 12 (3) (2013) 218 245. [17] C. Maccone, Evolution and history in a new “Mathematical SETI” bi ai model, Acta Astronaut. 93 (2014) 317–344. Available online since 13 and finally, from (48) and (50) August 2013. [18] C. Maccone, SETI as a part of big history, Acta Astronaut. 101 (2014) 1 67–80. pxðÞ¼ with ai rxrbi: ð51Þ bi ai [19] C. Maccone, Evolution and mass extinctions as lognormal stochastic processes, Int. J. Astrobiol. 13 (4) (2014) 290–309. showing that the maximum-entropy probability distribu- [20] 〈http://en.wikipedia.org/wiki/Sara_Seager#Seager_equation〉. tion over any FINITE interval ai rxrbi is just the UNI- [21] P. Gilster, Astrobiology: Enter the Seager Equation, Centauri Dreams, FORM distribution. 11 September 2013, 〈http://www.centauri-dreams.org/?p=28976〉.