A Generalization of the Erd\H {O} S-Kac Theorem
Total Page:16
File Type:pdf, Size:1020Kb
A GENERALIZATION OF THE ERDŐS-KAC THEOREM JOSEPH SQUILLACE UNIVERSITY OF RHODE ISLAND DEPARTMENT OF COMPUTER SCIENCE & STATISTICS Abstract. Given n 2 N, let ! (n) denote the number of distinct prime factors of n, let Z denote a standard normal variable, and let Pn denote the uniform distribution on f1; : : : ; ng. The Erdős-Kac Theorem states that 1=2 Pn m ≤ n : ! (m) − log log n ≤ x (log log n) ! P (Z ≤ x) as n ! 1; i.e., if N (n) is a uniformly distributed variable on f1; : : : ; ng, then ! (N (n)) is asymptotically normally distributed as n ! 1 with both mean and variance equal to log log n. The contribution of this paper is a generalization of the Erdős-Kac Theorem to a larger class of random variables by considering per- 1 turbations of the uniform probability mass n in the following sense. Denote by Pn a probability distribution 1 on f1; : : : ; ng given by Pn (i) = n + "i;n. By providing some constraints on the "i;n’s, sufficient conditions are stated in order to conclude that 1=2 Pn m ≤ n : ! (m) − log log n ≤ x (log log n) ! P (Z ≤ x) as n ! 1: The main result will be applied to prove that the number of distinct prime factors of a positive integer with either the Harmonic(n) distribution or the Zipf(n; s) distribution also tends to the normal distribution N (log log n; log log n) as n ! 1 (and as s ! 1 in the case of a Zipf variable). 1. Introduction Given a natural number n, the number of distinct prime factors of n is denoted ! (n). For example, 3 2 P ! 2 5 7 = 3. The function ! may be written as ! (n) = pjn 1, where the sum is over all prme factors of n. In 1917, Hardy and Ramanujan (p. 270 of [3]) proved that the number of distinct prime factors of a natural number n is about log log n. In particular, they showed that the normal order of ! (n) is log log n−i.e., for every " > 0, the proportion of the natural numbers for which the inequalities (1 − ") log log n ≤ ! (n) ≤ (1 + ") log log n do not hold tends to 0 as n ! 1. Informally speaking, the Erdős-Kacp Theorem generalizes the Hardy- Ramanujan Theorem by showing that ! (n) is about log log n + Z · log log n, where Z ∼ N (0; 1). More precisely, the Erdős-Kac Theorem is the following result (p. 738 of [2]). 1 Theorem 1. Let Pn denote the uniform distribution on f1; 2; ; : : : ; ng. As n ! 1, 1=2 Pn m ≤ n : ! (m) − log log n ≤ x (log log n) ! P (Z ≤ x) : arXiv:2011.00152v1 [math.NT] 31 Oct 2020 Figure 1 shows plots of the values of ! (n) for n between, respectively, 1-100, 1-1000, and 1-10000. Figure 1. Some values of ! (n) plotted using Mathematica. 1 I.e., if U is uniformly distributed on f1; 2; : : : ; ng, then for any subset A ⊆ f1; 2; : : : ; ng, Pn (A) = P (U 2 A). 1 A GENERALIZATION OF THE ERDŐS-KAC THEOREM 2 Furthermore, Figure 1 illustrates how slowly the values of ! (n) grow along with their variation, and this is consistent with the parameters µ = σ2 = log log n appearing in the limiting normal distribution. Theorem 1 4 4 suggests that, for example, since log log e(e ) = 4, integers near e(e ) ≈ 514843556263457212366848 have, on average, 4 distinct prime factors. Using Mathematica, computing the mean of ! (n) for n within 1000000 of 514843556263457212366848 yields N[Mean[Table[PrimeNu[n], n, 514843556263457211366848, 514843556263457213366848]], 3] = 4:27: Several generalizations of Theorem 1 exist. For example, Liu [4] extends Theorem 1 to the setting of free abelian modules other than the positive integers. A recent generalization of Theorem 1 was considered in [7], where Sun and Wu showed that Theorem 1 corresponds to a special case of Theorem 1 of [7] once the z −v2 R l 2 parameter l is set to 0. In particular, they considered integrals of the form −∞ v e dv and bounds for sums of the form l 1 X ! (n) − log log x p ; x log log x !(n)−log log n n≤x; p ≤z log log n upon setting l = 0, these expressions become the CDF of the standard normal distribution and the propor- !(n)−log log n p tion of n satisfying log log n ≤ z, respectively. A generalization given by Saidak [6], assumes a quasi Generalized Riemann Hypothesis for Dedekind zeta functions and concerns a function fa (p) defined to be the minimal e for which ae ≡ 1 mod p. Saidak shows that the limit of n !(f (p))−log log p o ap p ≤ x : A ≤ log log p ≤ B π (x) as x ! 1 tends to the standard normal CDF Φ(B) − Φ(A). There are several generalizations of Theorem 1 to algebraic number fields, such as [5] where Pollack considers the number of principal ideals. A similarity in the generalizations mentioned is that they consider limiting distributions of ! (·) for uniformly distributed random integers. The contribution of this paper is to extend the Erdős-Kac Theorem to a larger class of random variables, other thanp a uniformly distributed variable, on [n] := f1; 2; : : : ; ng which also have, asymptotically, log log n + Z · log log n many distinct prime factors. 1.1. The Main Theorem. Define a probability distribution Pn on [n] given by 1 (1) (i) = + " ; Pn n i;n and impose the constraint that for each k-tuple (p1; : : : ; pk) consisting of distinct primes, j n k p1···pk X (2) lim "lp ···p ;n = 0: n!1 1 k l=1 The constraint (2) is applied in x2 where an analogue of Kac’s heuristic for Pn is provided, and the analogue of Kac’s heuristic suggests that for a Pn-distributed variable X, the events fp1 divides Xg ;:::; fpk divides Xg are independent when p1; : : : pk are distinct primes. Analogous to the case of the development of Theorem 1, it was the independence of these events that suggested a Gaussian law of errors. Moreover, it will be shown 1 1 that (2) implies limn!1 Pn (p divides X) = p (and it is easy to show that limn!1 Pn (p divides X) = p ). Due to the axioms of probability, the "i;n’s satisfy n X (3) "i;n =0; i=1 1 1 (4) " 2 − ; 1 − : i;n n n The motivation for defining Pn in terms of the uniform distribution is due to Durrett’s proof (Theorem 3.4.16 in [1]) of the Erdős-Kac Theorem. Replacing the uniform distribution Pn with the new distribution Pn in Durrett’s proof naturally yields some constraints that the terms "i;n; i ≤ n; must satisfy in order to conclude A GENERALIZATION OF THE ERDŐS-KAC THEOREM 3 p that an integer-valued random variable with the Pn distribution has about log log n + Z · log log n distinct prime factors. Our main result is the following theorem, where b·c denotes the floor function. Theorem 2. Let Z ∼ N (0; 1). Suppose the following statements are true. • There exists a constant C such that for all n and for all primes p with p > n1= log log n, n b p c X C (5) " ≤ : lp;n p l=1 • There exists a constant D such that j n k p1···pk X D (6) 0 ≤ " ≤ lp1···pk;n n l=1 for all n and, for each k, all k-tuples (p1; : : : ; pk) consisting of distinct primes of size at most n1= log log n. ∗ Let Pn denote a probability distribution obtained by imposing the constraints (5) and (6) on Pn. As n ! 1; ∗ 1=2 Pn m ≤ n : ! (m) − log log n ≤ x (log log n) ! P (Z ≤ x) : ∗ Remark. If "i;n = 0 for all i ≤ n, then Pn = Pn and Theorem 1 is obtained. 1.2. Outline. In x2, Kac’s heuristic for the number of distinct prime factors of a uniformly distributed variable on N is stated, and then an analogue of the heuristic is provided for any Pn-distributed random variable. Just as Kac’s heuristic suggested the appearance of the normal distribution in Theorem 1 along with 2 the parameters µ = σ = log log n, the analogue of Kac’s heuristic for Pn is the motivation for the conclusion of Theorem 2. The proof of Theorem 2 is provided in x3; the proof applies the method of moments and is motivated by Durrett’s proof of the Erdős-Kac Theorem (Theorem 3.4.16 in [1]). Moreover, in x3, the constraints (5) and (6) are applied to ensure that Pn also satisfies Durrett’s method of moment bounds. In x4, the values "i;n; i ≤ n; are given for the Harmonic distribution on [n], and it is shown that the "i;n’s ∗ satisfy constraints (2), (5), and (6). Therefore, the conclusion of Theorem 2 is true if Pn is replaced with ∗ the harmonic distribution. In a similar fashion, it is shown that a Zipf(n; s)-distribution can also replace Pn in Theorem 2, as long as s ! 1 as n ! 1. I.e., the number of distinct prime factors of an integer-valued random variable from the Harmonic(n) distribution distribution tends to N (log log n; log log n) as n ! 1; and the number of distinct prime factors of an integer-valued random variable from the Zipf(n; s) distribution tends to N (log log n; log log n) as n ! 1 and s ! 1.