A GENERALIZATION OF THE ERDŐS-KAC THEOREM

JOSEPH SQUILLACE UNIVERSITY OF RHODE ISLAND DEPARTMENT OF COMPUTER SCIENCE &

Abstract. Given n ∈ N, let ω (n) denote the number of distinct prime factors of n, let Z denote a standard normal variable, and let Pn denote the uniform distribution on {1, . . . , n}. The Erdős-Kac Theorem states that  1/2 Pn m ≤ n : ω (m) − log log n ≤ x (log log n) → P (Z ≤ x) as n → ∞; i.e., if N (n) is a uniformly distributed variable on {1, . . . , n}, then ω (N (n)) is asymptotically normally distributed as n → ∞ with both and equal to log log n. The contribution of this paper is a generalization of the Erdős-Kac Theorem to a larger class of random variables by considering per- 1 turbations of the uniform probability mass n in the following sense. Denote by Pn a 1 on {1, . . . , n} given by Pn (i) = n + εi,n. By providing some constraints on the εi,n’s, sufficient conditions are stated in order to conclude that  1/2 Pn m ≤ n : ω (m) − log log n ≤ x (log log n) → P (Z ≤ x) as n → ∞. The main result will be applied to prove that the number of distinct prime factors of a positive integer with either the Harmonic(n) distribution or the Zipf(n, s) distribution also tends to the N (log log n, log log n) as n → ∞ (and as s → 1 in the case of a Zipf variable).

1. Introduction Given a natural number n, the number of distinct prime factors of n is denoted ω (n). For example, 3 2  P ω 2 5 7 = 3. The function ω may be written as ω (n) = p|n 1, where the sum is over all prme factors of n. In 1917, Hardy and Ramanujan (p. 270 of [3]) proved that the number of distinct prime factors of a natural number n is about log log n. In particular, they showed that the normal order of ω (n) is log log n−i.e., for every ε > 0, the proportion of the natural numbers for which the inequalities (1 − ε) log log n ≤ ω (n) ≤ (1 + ε) log log n

do not hold tends to 0 as n → ∞. Informally speaking, the Erdős-Kac√ Theorem generalizes the Hardy- Ramanujan Theorem by showing that ω (n) is about log log n + Z · log log n, where Z ∼ N (0, 1). More precisely, the Erdős-Kac Theorem is the following result (p. 738 of [2]). 1 Theorem 1. Let Pn denote the uniform distribution on {1, 2, , . . . , n}. As n → ∞,  1/2 Pn m ≤ n : ω (m) − log log n ≤ x (log log n) → P (Z ≤ x) . arXiv:2011.00152v1 [math.NT] 31 Oct 2020 Figure 1 shows plots of the values of ω (n) for n between, respectively, 1-100, 1-1000, and 1-10000.

Figure 1. Some values of ω (n) plotted using Mathematica.

1 I.e., if U is uniformly distributed on {1, 2, . . . , n}, then for any subset A ⊆ {1, 2, . . . , n}, Pn (A) = P (U ∈ A). 1 A GENERALIZATION OF THE ERDŐS-KAC THEOREM 2

Furthermore, Figure 1 illustrates how slowly the values of ω (n) grow along with their variation, and this is consistent with the parameters µ = σ2 = log log n appearing in the limiting normal distribution. Theorem 1  4  4 suggests that, for example, since log log e(e ) = 4, integers near e(e ) ≈ 514843556263457212366848 have, on average, 4 distinct prime factors. Using Mathematica, computing the mean of ω (n) for n within 1000000 of 514843556263457212366848 yields N[Mean[Table[PrimeNu[n], n, 514843556263457211366848, 514843556263457213366848]], 3] = 4.27. Several generalizations of Theorem 1 exist. For example, Liu [4] extends Theorem 1 to the setting of free abelian modules other than the positive integers. A recent generalization of Theorem 1 was considered in [7], where Sun and Wu showed that Theorem 1 corresponds to a special case of Theorem 1 of [7] once the z −v2 R l 2 parameter l is set to 0. In particular, they considered integrals of the form −∞ v e dv and bounds for sums of the form l 1 X ω (n) − log log x √ ; x log log x ω(n)−log log n n≤x, √ ≤z log log n upon setting l = 0, these expressions become the CDF of the standard normal distribution and the propor- ω(n)−log log n √ tion of n satisfying log log n ≤ z, respectively. A generalization given by Saidak [6], assumes a quasi Generalized Riemann Hypothesis for Dedekind zeta functions and concerns a function fa (p) defined to be the minimal e for which ae ≡ 1 mod p. Saidak shows that the limit of

n ω(f (p))−log log p o a√ p ≤ x : A ≤ log log p ≤ B π (x) as x → ∞ tends to the standard normal CDF Φ(B) − Φ(A). There are several generalizations of Theorem 1 to algebraic number fields, such as [5] where Pollack considers the number of principal ideals. A similarity in the generalizations mentioned is that they consider limiting distributions of ω (·) for uniformly distributed random integers. The contribution of this paper is to extend the Erdős-Kac Theorem to a larger class of random variables, other than√ a uniformly distributed variable, on [n] := {1, 2, . . . , n} which also have, asymptotically, log log n + Z · log log n many distinct prime factors.

1.1. The Main Theorem. Define a probability distribution Pn on [n] given by  1  (1) (i) = + ε , Pn n i,n

and impose the constraint that for each k-tuple (p1, . . . , pk) consisting of distinct primes,

j n k p1···pk X (2) lim εlp ···p ,n = 0. n→∞ 1 k l=1

The constraint (2) is applied in §2 where an analogue of Kac’s heuristic for Pn is provided, and the analogue of Kac’s heuristic suggests that for a Pn-distributed variable X, the events {p1 divides X} ,..., {pk divides X} are independent when p1, . . . pk are distinct primes. Analogous to the case of the development of Theorem 1, it was the independence of these events that suggested a Gaussian law of errors. Moreover, it will be shown 1 1 that (2) implies limn→∞ Pn (p divides X) = p (and it is easy to show that limn→∞ Pn (p divides X) = p ). Due to the axioms of probability, the εi,n’s satisfy n X (3) εi,n =0, i=1  1 1  (4) ε ∈ − , 1 − . i,n n n

The motivation for defining Pn in terms of the uniform distribution is due to Durrett’s proof (Theorem 3.4.16 in [1]) of the Erdős-Kac Theorem. Replacing the uniform distribution Pn with the new distribution Pn in Durrett’s proof naturally yields some constraints that the terms εi,n, i ≤ n, must satisfy in order to conclude A GENERALIZATION OF THE ERDŐS-KAC THEOREM 3 √ that an integer-valued with the Pn distribution has about log log n + Z · log log n distinct prime factors. Our main result is the following theorem, where b·c denotes the floor function.

Theorem 2. Let Z ∼ N (0, 1). Suppose the following statements are true.

• There exists a constant C such that for all n and for all primes p with p > n1/ log log n,

n b p c X C (5) ε ≤ . lp,n p l=1

• There exists a constant D such that

j n k p1···pk X D (6) 0 ≤ ε ≤ lp1···pk,n n l=1

for all n and, for each k, all k-tuples (p1, . . . , pk) consisting of distinct primes of size at most n1/ log log n.

∗ Let Pn denote a probability distribution obtained by imposing the constraints (5) and (6) on Pn. As n → ∞,

∗  1/2 Pn m ≤ n : ω (m) − log log n ≤ x (log log n) → P (Z ≤ x) .

∗ Remark. If εi,n = 0 for all i ≤ n, then Pn = Pn and Theorem 1 is obtained.

1.2. Outline. In §2, Kac’s heuristic for the number of distinct prime factors of a uniformly distributed variable on N is stated, and then an analogue of the heuristic is provided for any Pn-distributed random variable. Just as Kac’s heuristic suggested the appearance of the normal distribution in Theorem 1 along with 2 the parameters µ = σ = log log n, the analogue of Kac’s heuristic for Pn is the motivation for the conclusion of Theorem 2. The proof of Theorem 2 is provided in §3; the proof applies the method of moments and is motivated by Durrett’s proof of the Erdős-Kac Theorem (Theorem 3.4.16 in [1]). Moreover, in §3, the constraints (5) and (6) are applied to ensure that Pn also satisfies Durrett’s method of bounds. In §4, the values εi,n, i ≤ n, are given for the Harmonic distribution on [n], and it is shown that the εi,n’s ∗ satisfy constraints (2), (5), and (6). Therefore, the conclusion of Theorem 2 is true if Pn is replaced with ∗ the harmonic distribution. In a similar fashion, it is shown that a Zipf(n, s)-distribution can also replace Pn in Theorem 2, as long as s → 1 as n → ∞. I.e., the number of distinct prime factors of an integer-valued random variable from the Harmonic(n) distribution distribution tends to N (log log n, log log n) as n → ∞; and the number of distinct prime factors of an integer-valued random variable from the Zipf(n, s) distribution tends to N (log log n, log log n) as n → ∞ and s → 1.

2. Kac’s Heuristic and its Analogue for Pn Kac’s heuristic2 for the uniform distribution (pp. 154-155 of [1]) suggests the statement of Theorem 1 and is based on the fact that given a random integer n ∈ N, the events {p divides n} and {q divides n} are independent when p and q are distinct primes.

2The heuristic is based on the connection between independence and a Gaussian law of errors. A GENERALIZATION OF THE ERDŐS-KAC THEOREM 4

Now consider the probability distribution Pn and its behavior in the limit. Given a prime p, let Ap denote the set of positive integers divisible by p. Letting 1{·} denote an indicator random variable,

∞ (Ap) := lim n (Ap) P n→∞ P n b p c X = lim n (lp) n→∞ P l=1 n b p c   (1) X 1 = lim 1{lp≤n} + εlp,n n→∞ n l=1 j k n n b p c p X = lim + lim εlp,n n→∞ n n→∞ l=1 (2) 1 = . p

If q 6= p is another prime, then, similarly,

1 X (A ∩ A ) = + ε P∞ p q pq pql l≥1 (2) 1 = pq = P∞ (Ap) P∞ (Aq) .

Therefore, the events Ap and Aq are independent. In general, (2) ensures that, for any positive integer k,

the events Ap1 ,...,Apl are independent for 1 < l ≤ k. Let δp (n) = 1p|n so that

X ω (n) = δp (n) p≤n

is the number of distinct prime factors of n. The indicator variables δp behave like Bernoulli variables Xp that are independent and identically distributed with

1 (X = 1) = , P p p P (Xp = 0) = 1 − 1/p.

P The mean of p≤n Xp is

  X X E  Xp = P (Xp = 1) p≤n p≤n X 1 = , p p≤n A GENERALIZATION OF THE ERDŐS-KAC THEOREM 5

P and the variance of p≤n Xp is   X X Var  Xp = Var (Xp) p≤n p≤n

X  2 2 = E Xp − (E (Xp)) p≤n X 2 X  2 = E Xp − (E (Xp)) p≤n p≤n 2! X X 1 = (X ) − E p p p≤n p≤n X 1 X 1 = − . p p2 p≤n p≤n P 1 P 1 Note that p≤n p = log log n + O (1) by Merten’s Second Theorem (p. 22 of [8]); the series p≤n p2 P 1 converges (by comparison with 1≤n n2 ). Therefore, the mean and variance are X 1 = log log n + O (1) , p p≤n X 1 X 1 − = log log n + O (1) . p p2 p≤n p≤n

This concludes the analogue of Kac’s heuristic for Pn and justifies the parameters of the normal distribution appearing in Theorem 2.

3. Proving Theorem 2 1/ log log n Define αn := n . Lemma 3. As n → ∞   n  b p c X 1 X 1/2   + ε  / (log log n) → 0.  p lp,n αn

Proof. Given n and any prime p with p > αn, j k n n b p c p (4) X (6) D − ≤ ε ≤ . n lp,n p l=1 Therefore,

n b p c 1 X  D + 1 (7) + ε ∈ 0, p lp,n p l=1 for all n. Thus,   n  b p c X 1 X 1/2   + ε  / (log log n) → 0  p lp,n αn

The following lemma is proved by Durrett (p. 156 of [1]).

ε Lemma 4. If ε > 0, then αn ≤ n for large n and hence αr (8) n → 0 n for all r < ∞. Proof of Theorem 2. Let g (m) = P δ (m) and let denote expectation with respect to ∗ . Then n p≤αn p En Pn   X X ∗ En  δp = Pn (m : δp (m) = 1) αn

n b p c X X ∗ = Pn (m : m = lp) αn

n b p c   (1) X X 1 = + ε n lp,n αn

n n b p c b p c X X 1 X X = + ε n lp,n αn

n b p c X 1 X X ≤ + ε , p lp,n αn

p≤αn bn := E (Sn) , 2 an := Var (Sn) . 2  1/2 By Lemma 3, bn and an are both log log n + o (log log n) , so it suffices to show ∗ Pn (m : gn (m) − bn ≤ xan) → P (Z ≤ x) .

An application of Theorem 3.4.5 of [1] shows (Sn − bn) /an → Z, and since |Xp| ≤ 1, it follows from Durrett’s r r second proof of Theorem 3.4.5 [1] that E ((Sn − bn) /an) → E (Z ) for all r. Using the notation from that proof (and replacing ij by pj) it follows that r X X r! 1 X (Sr ) = Xr1 ··· Xrk  , E n E p1 pk r1! ··· rk! k! k=1 ri pj where the sum P extends over all k-tuples of positive integers for which r +···+r = r, and P extends ri 1 k pj over all k-tuples of distinct primes in [n]. Since X ∈ {0, 1}, the summand in P Xr1 ··· Xrk  is p pj E p1 pk 1 E (Xp1 ··· Xpk ) = p1 ··· pk A GENERALIZATION OF THE ERDŐS-KAC THEOREM 7

by independence of the Xp’s. Moreover,

En (δp1 ··· δpk ) ≤ Pn (m : δp1 (m) = δp2 (m) ··· = δpk (m) = 1)   n   = Pn m : m = p1 ··· pk, 2p1 . . . pk,..., p1 ··· pk p1 ··· pk

j n k p1···pk X = Pn (m : m = lp1 ··· pk) l=1

j n k p1···pk   (1) X 1 = + ε n lp1···pk,n l=1 j k j k n n p1···pk p1···pk X = + ε . n lp1···pk,n l=1

The two moments differ by at most

 j n k j n k  j n k j n k  p1···pk p1···pk   1 p1···pk X p1···pk X 1  max − − εlp1···pk,n, + εlp1···pk,n − . p1 ··· pk n n p1 ··· pk  l=1 l=1 

Further,

j n k j n k j n k p1···pk n − 1 p1···pk 1 p1···pk X 1 p1···pk X − − εlp1···pk,n ≤ − − εlp1···pk,n p1 ··· pk n p1 ··· pk n l=1 l=1

j n k p1···pk 1 X = − ε , n lp1···pk,n l=1

and

j k j k j k n n n p1···pk p1···pk p1···pk X 1 X + εlp1···pk,n − ≤ εlp1···pk,n. n p1 ··· pk l=1 l=1

Thus, the maximum becomes

 j n k j n k  p1···pk p1···pk  1 X X  max − ε , ε . n lp1···pk,n lp1···pk,n  l=1 l=1 

j n k j n k j n k 1 P p1···pk P p1···pk 1 P p1···pk The maximum equals n − l=1 εlp1···pk,n if and only if l=1 εlp1···pk,n ≤ n − l=1 εlp1···pk,n, j n k P p1···pk 1/2 and the latter is equivalent to l=1 εlp1···pk,n ≤ n . On the other hand, if the maximum equals j n k j n k P p1···pk P p1···pk C l=1 εlp1···pk,n, then inequality (5) implies l=1 εlp1···pk,n ≤ n . Therefore, the two rth moments differ by A GENERALIZATION OF THE ERDŐS-KAC THEOREM 8

  j n k j n k  r p ···p p ···p r! 1  1 1 k 1 k  r r X X X  X X  |E (Sn) − En (gn)| ≤ max − εlp1···pk,n, εlp1···pk,n r1! ··· rk! k!  n  k=1 ri pj  l=1 l=1  r  1 X X r! 1 X max C, ≤ 2 r1! ··· rk! k! n k=1 ri pj  r  1 max C, X ≤ 2 1 n   p≤αn  1  αr ≤ max C, n 2 n (8) → 0.

This completes the proof of Theorem 2. 

4. Illustrative Examples ∗ In this section, examples are given to help give a description of the class of distributions Pn. It will be ∗ shown that the statement of Theorem 2 holds when Pn is replaced with either the Harmonic(n) or Zipf(n, s) distributions as long as s → 1 for the latter.

Example 5. Given n ∈ N, let Qn denote the Harmonic(n) distribution on [n] so that 1 Qn (i) = 1{i∈[n]} Pn 1 . i i=1 i

In order to apply Theorem 2 to Qn, it will be shown that the terms εi,n corresponding to Pn = Qn satisfy 1 1 (2), (5), and (6). If i ∈ [n], then (1) implies εi,n = Pn 1 − n . The left hand side of (2) is i i=1 i

j n k j n k p1···pk p1···pk X X  1 1  lim εlp1···pk,n = lim n − n→∞ n→∞ lp ··· p P 1 n l=1 l=1 1 k i=1 i  j k  n j n k P p1···pk 1 l=1 l p1···pk = lim  n −  n→∞  P 1  p1 ··· pk i=1 i n 1 1 = − p1 ··· pk p1 ··· pk = 0. The left hand side of (5) becomes

n n b p c b p c X X  1 1  ε = − lp,n lp Pn 1 n l=1 l=1 i=1 i n j n k Pb p c 1 i=1 i p = Pn 1 − p l=1 i n 1  n − 1 ≤ − p p n 1 ≤ n 1 ≤ , p A GENERALIZATION OF THE ERDŐS-KAC THEOREM 9

so (5) holds with C = 1. Further, the left hand side of (6) becomes

j n k j n k p1···pk p1···pk X X  1 1  ε = − lp1···pk,n lp ··· p Pn 1 n l=1 l=1 1 k i=1 i j k n j n k P p1···pk 1 i=1 i p1···pk = Pn 1 − p1 ··· pk i=1 i n 1  n − 1 ≤ − p1···pk p1 ··· pk n 1 = , n so (6) holds with D = 1. This proves the analogue of the Erdős-Kac Theorem in the case of an integer-valued random variable with the Harmonic(n) distribution by Theorem 2.  1 Example 6. Given s > 1, denote by Zs the Zeta(s) distribution so that for any j ∈ N, Zs (j) = jsζ(s) , P 1 where ζ (j) = j≥1 js denotes the Riemann zeta function. Since Theorem 2 involves distributions defined Pn 1 on [n], restrict the Zeta(s) distribution to [n] and then normalize by dividing by i=1 isζ(s) . I.e., for j ∈ [n], 1 s : j ζ(s) Zs,n (j) = Pn 1 i=1 isζ(s) 1 = ; s Pn 1 j i=1 is

and Zs,n is known as the Zipf distribution with parameters n and s. As s → 1, the Zipf(n, s) distribution Zs,n converges in distribution to the Harmonic(n) distribution Qn. By Example 5, this ensures that constraints (2), (5), and (6) will hold for Zs,n for s sufficiently close to 1 and dependent on n. Therefore, the number of distinct prime factors of a Zipf(s, n) distribution tends to N (log log n, log log n) as n → ∞ and s → 1.  5. Conclusion Theorem 2 generalizes the Erdős-Kac Theorem to a larger family of distributions beyond the uniform dis- tribution, and this theorem was proved by imposing the constraints (2), (5), and (6) on Pn. There are several ways to strengthen Theorem 2. As demonstrated in §2, equation (2) is equivalent to the independence of the events p1N, . . . , pkN, with respect to limn→∞ Pn, for distinct primes p1, . . . , pk. Moreover, independence naturally leads to a Gaussian law for the limiting distribution. Therefore, by weakening (2), the limiting distribution will either not be a Gaussian or will consists of parameters other than µ = σ2 = log log n in case it is a Gaussian. Example 5 provides a technique for which Theorem 2 can be applied to a particular distribution on [n] in order to conclude that the distribution of ω (·) is asymptotically N (log log n, log log n). Providing further examples, or counterexamples, of familiar distributions will provide a better description of the class of distributions determined by (2), (5), and (6). Inequalities (5) and (6) were applied in or- der to apply the method of moments approach given by Durrett; the question remains as to the extent to which these inequalities can be weakened in order to still obtain sufficient bounds in the method of moments estimates. Another way to generalize Theorem 2 would be to incorporate it with other generalizations, e.g., [4, 5, 6, 7]. By incorporating Theorem 2 with these generalizations, further generalizations can be made in which the original setting is not [n], the underlying distribution of the random-integer is not uniform, and ω (n) can be replaced with a more general function ω (f (n)).

References [1] Durrett, R. Probability: Theory and Examples, 5th Ed, Cambridge University Press, 2019. [2] Erdős, P; Kac, M. The Gaussian Law of Errors in the Theory of Additive Number Theoretic Functions. American Journal of Mathematics. 62 (1/4): 738–742, 1940. A GENERALIZATION OF THE ERDŐS-KAC THEOREM 10

[3] Hardy, G. H.; Ramanujan, S. The normal number of prime factors of a number n, Quarterly Journal of Mathematics, 48: 76–92, 1917. [4] Liu, R. A Generalization of the Erdős-Kac Theorem and its Applications. Canad. Math. Bull. Vol. 47 (4), pp. 589–606, 2004. [5] Pollack, P., An elemental Erdős-Kac theorem for algebraic number fields. [6] Saidak, F., Arch. Math. 85, pp. 345–361, Birkhauser Verlag, Basel, 2005. [7] Sun, Y.; Wu, L., Generalization of Erdős-Kac theorem, Front. Math. China, 14(6): 1303–1316, 2019. [8] Tenenbaum, G.; Mendes-France, M. The Prime Numbers and their Distribution. Providence, RI: Amer. Math. Soc., 2000.