Characteristic functions

1/13/12

Literature Rick Durrett, Probability: Theory and Examples, Duxbury 2010. Olav Kallenberg, Foundations of Modern Probability, Springer 2001, Eugene Lukacs, Characteristic Functions, Hafner 1970. Robert G. Bartle, The Elements of Real Analysis, Wiley 1976. Michel Lo`eve, , Van Nostrand 1963.

Contents

1 Definition and basic properties 2

2 Continuity 4

3 Inversion 7

4 CLT 9 4.1 The basic CLT ...... 9 4.2 Lindeberg-Feller Condition ...... 11 4.3 Poisson convergence ...... 14 4.4 Exercises ...... 17

5 Poisson random measure 19 5.1 Poisson measure and integral ...... 21 5.2 About stochastic processes ...... 22 5.3 Classical Poisson Process ...... 24 5.4 Transformations of Poisson process ...... 26 5.4.1 Nonhomogeneous Poisson process ...... 26 5.4.2 Reward or compound Poisson process ...... 27 5.5 A few constructions of Poisson random measure ...... 29 5.5.1 adding new atoms ...... 29 5.5.2 gluing the pieces ...... 30 5.5.3 using a density of a random element ...... 31 5.6 Exercises ...... 32 5.7 Non-positive awards ...... 34 5.8 SSα - symmetric α-stable processes ...... 37

1 5.9 Exercises ...... 39

6 Infinitely divisible distributions 40 6.1 Preliminaria ...... 40 6.2 A few theorems ...... 41 6.3 A side trip: decomposable distributions ...... 42 6.4 ID of Poisson type ...... 44 6.5 L´evy-Khinchin formula ...... 45 6.6 Exercises ...... 48

1 Definition and basic properties

Let µ be a of a X. The characteristic function, a.k.a. Fourier transform, is the complex valued one-parameter function ∫ µˆ(t) = φ(t) = eıtx µ(dx) = E eitX . R L Rd Similarly we⟨ define⟩ the ch.f. of a probability distribution µ = (X) on or in a Hilbert space where tx = t, x is the inner product. The definition applies also to finite measures, even to signed measures of bounded variation. The term “characteristic function” is restricted to probability measures.

Proposition 1.1 Every ch.f. φ(t) = µb(t) = E eitX has the properties:

1. φ(0) = 1;

2. |φ| ≤ 1;

3. φ is uniformly continuous on R.

4. φ is semi-positive definite, i.e., ∑ ∑ φ(tj − tk) zj zk ≥ 0, for every finite sets { tj } ⊂ R, { zj } ⊂ C j k

Proof.

(3): |φ(s) − φ(t)| ≤ E eisX − eitX ≤ E 1 − ei(s−t)X ≤ E 1 ∧ |s − t|X.

∑ 2 ∑ ∑ itj X (4): 0 ≤ E zjE e = φ(tj − tk)zjzk. j j k A probabilist should know ch.fs. of basic probability distributions by heart and how they behave under simple transformations. To wit:

2 Proposition 1.2 1. φaX (t) = φX (at), hence φ−X = φ.

2. A convex combination of ch.fs. is a ch.f.

3. Hence, given a ch.f. φ, ℜφ = (φ + φ)/2 is a ch.f.

4. The finite product of ch.fs. is a ch.f. Namely,

φ ··· φ = φ ′ ··· ′ , X1 Xn X1+ +Xn

where Xk’s are independent copies of Xk. In other words,

µˆ1 ··· µˆn = (µ1 ⊗ · · · ⊗ µn)ˆ.

5. Hence, given a ch.f. φ, |φ| and the natural powers φn and |φ|n are ch.fs.

D 6. a ch.f. is real if and only if1 X is symmetric, i.e. X = −X.

Notice that We will present examples as needed.

Example 1.3 A “duality”. 2(1 − cos t) 1 − cos x The triangular density (1 − |x|) has the ch.f. . The Polya density has the + t2 πx2 ch.f. (1 − |t|)+. 1 The symmetrized exponential distribution with the density e−|x|/2 has the ch.f. . The ch.f. 1 + t2 1 of the Cauchy density equals e−|t|. π(1 + x2)

Using the idea from the proof of (3) of Proposition 1.1, for a family µα = L(Xα) we obtain the upper estimate that involves the standard L0-metric:

sup |φα(s) − φα(t)| ≤ sup ∥(s − t)Xα∥0 α α

0 Corollary 1.4 If the family { µα } is tight (i.e., { Xα } is bounded in L ), then { φα } is uniformly equi-continuous.

The opposite implication is also true.

1only the “if” part is obvious now

3 Lemma 1.5 Consider µ = L(X), φ =µ ˆ. Then, for r > 0, ∫ r 2/r (1) P(|X| ≥ r) = µ[−r, r]c ≤ (1 − φ(t)) dt, 2 −2/r ∫ 1/r (2) P(|X| ≤ r) = µ[−r, r] ≤ 2r |φ(t)| dt. −1/r Proof. W.l.o.g. we may and do assume that r = 1 (just consider X/r and change the variable in the right hand side integrals).

(1): By Fubini’s theorem the right hand side equals ∫ ( ) 2 ( ) 1 itX sin 2X E 1 − e dt = E 2 1 − ≥ E 1I{|X|≥1} = P(|X| ≥ 1). 2 −2 2X (2): In virtue of Fubini’s theorem the left hand side is estimated as follows, using the formula for the ch.f. of the triangular density: ∫ ∫ ∫ − 1 ≤ 2(1 cos X) − | | itX − | | ≤ | | E 1I{|X|≤1} E 2 = E (1 t )+e dt = (1 t )+φ(t) dt ϕ(t) dt. 2 X R R R

Corollary 1.6 If a family { φα } of ch.fs. is equicontinuous at 0, then µα is tight.

| − | | | Proof. Let ϵ > 0 and δ > 0 be such that supα 1 φα(t) < ϵ/2 whenever t < δ. Let r0 = 2/δ. Then (1) in the Lemma entails

sup P(|Xα| ≥ r) ≤ ϵ for r > r0 α

2 Continuity

Theorem 2.1 (L´evyContinuity Theorem)

For ch.fs. φn = µcn and φ0 = µc0 the following are equivalent:

1. φn → φ0 pointwise;

w 2. µn → µ0;

3. φn → φ0 uniformly on every interval.

Proof. (2) ⇒ (1) follows by the definition of weak convergence and (3) ⇒ (1) is obvious.

The remaining nontrivial implications (1) ⇒ (2) and (2) ⇒ (3) would be much easier to prove if the measures would have the common bounded support, i.e., underlying random variables were

4 bounded. However, each of the assumptions implies that the family { µn } is tight, i.e. they are almost supported by a compact set.

(1) ⇒ (2): Assume the point convergence of ch.fs., which means that µnet → µ0et for special itx functions et(x) = e , and thus for their finite linear combinations, forming an algebra A. We must show that µnf → µ0f for every continuous bounded function on R.

We infer that { µn } is tight. Indeed, let ϵ > 0 and choose r > 0 such that |1 − φ(t)| < ϵ/4 for every t ∈ [−r, r]. By Lemma 1.5.1 and the Dominated Convergence Theorem ∫ ∫ 2/r 2/r c r r lim sup µn[−r, r] ≤ lim sup (1 − φn(t)) dt = (1 − φ(t)) dt < ϵ/2 n n 2 −2/r 2 −2/r

Then there is n0 such that c sup µn[−r, r] < ϵ. n>n0 At the same time there is r′ > 0 such that

′ ′ c sup µn[−r , r ] < ϵ n≤n0 Taking R = r ∨ r′, c sup µn[−R,R] < ϵ. n For any continuous bounded complex function h on R

∫ ∫

− ≤ || || h µn h µ 2 h ∞ ϵ. (2.1) [−R,R]c [−R,R]c The Stone-Weierstrass Theorem (cf., e.g., Bartle, Theorem 26.2) says:

If K ⊂ Rd is compact, and A is a separating algebra with unit that consists of complex functions on K, then every continuous function on K can be uniformly approximated by members of A.

Take K = [−R,R] and the algebra A, defined above. In virtue of the Stone-Weierstrass Theorem there is g ∈ A such that ||f − g||K < ϵ. Hence, using (2.1) for h = f and for h = g,

− ≤ − ≤ − (µn µ0)f (µn µ0)f 1IKc + (µn µ0) f1IK

≤ || || − − − − 2( f ∞ϵ + (µn µ0)()f g )1IK + (µn µ0)g1IKc + (µn µ0)g

≤ 2||f||∞ + 2 + 2||g||∞ ϵ + (µn − µ0)g

Let n → ∞ and then ϵ → 0.

(2) ⇒ (3): Assume the weak convergence, consider an interval [−T,T ], and let ϵ > 0. So,

{ µn : n ≥ 0 } is tight, hence there is r > 0 such that

∫ ϵ itx ≤ − c ≤ sup e µn(dx) sup µn[ r, r] . (2.2) n≥0 [−r,r]c n 4

5 Then

∫ ∫ ∫ r | − | ≤ itx itx itx − φn(t) φ(t) e µn(dx) + e µ0(dx) + e (µn µ0)(dx) = A + B + C [−r,r]c [−r,r]c −r

By (2.2), the sum A + B of the two first terms is bounded by ϵ/2.

To estimate the third term, consider a partition (xk) of [−r, r] of mesh < ϵ/(8T ), chosen from the continuity set of µ0. In particular, we may enlarge the interval [−r, r], so −r, the first point of the partition, and r, the last point of the partition, are also continuity points. In short, ∫ ∫ r ∑ xk = − r k xk−1

itx Adding and subtracting the term e k on each interval (xk−1, xk], C is bounded from above by the following expression:

∫ ∫ ∑ xk ( ) ∑ xk ( ) ∑ itx − itxk itx − itxk − e e µn(dx) + e e µ0(dx) + µn(xk−1, xk] µ0(xk−1, xk] k xk−1 k xk−1 k

itx itx Since e − e k ≤ t|x − xk|, hence

∑ ∫ ∑ ∫ xk ( ) ϵ xk ϵ eitx − eitxk µ (dx) ≤ µ (dx) ≤ , n 8 n 8 k xk−1 k xk−1 so the sum of the first two terms in the latter estimate is less than ϵ/4. For the third term, choose n0 such that, for every n ≥ n0,

∑ ϵ µ (x − , x ] − µ(x − , x ] < . n k 1 k k 1 k 4 k

So, for every ϵ > 0 and every interval [−T,T ] there is n0 such that for every t ∈ [−T,T ] there holds

|φn(t) − φ(t)| < ϵ, which completes the proof of (2) ⇒ (3).

Corollaries and remarks

1. The L´evyContinuity Theorem easily extends to Rd. In the language of random vectors

D D d Xn → X ⇐⇒ a · Xn → a · X, a ∈ R

2. L´evyUniqueness Theorem φ = φ0 ⇐⇒ µ = µ0.

3. The assumption that the limit φ0 is a ch.f. can be relaxed.

It suffices to assume that φ0 is continuous at 0, then it will be a ch.f. of some measure µ0.

6 This follows by Prokhorov’s theorem. The continuity at 0 is necessary. For example, the ch.f. − sin nt of the uniform distribution on [ n, n] is nt , which converges to 1I{0}(t). This sequence of measures is not tight. It generates a continuous functional on Cb(R), a sort of generalized integral akin to C´esarosum: ∫ 1 n Λf = lim f(x) dx n 2n −n which does not correspond to a measure.

4. The L´evyContinuity Theorem extends to bounded measures, also to signed measures with

bounded variation, after the weak convergence is augmented by the condition limn µnR = µR.

3 Inversion

Define the signum function as

σ = sign = 1I(0,∞) − 1I(−∞,0).

The following formula involves an improper integral of a function that is not integrable: ∫ ∞ sin ux dx = πσ(u). −∞ x

Its value follows from the Cauchy theorem that states that an analytic function f(z) on an open simply connected domain in the complex plane entails the curve integral vanishing over a rectifiable ̸ eiz simple closed curve. So, choosing u = 0 and then u = 1, f(z) = z is analytic on the complement of any closed disk centered at the origin. Denote by S(r) the semidisk |z| ≤ r, ℑz ≥ 0, and let C = C(ϵ, r) be the boundary of S(r) \ S(ϵ), oriented counterclockwise. Then, using the standard parametrization of four fragments - two segments and two semicircles, we have I ∫ ∫ ∫ ∫ ∫ r ix π −ϵ ix 0 ∞ e iθ e iθ sin x 0 = f(z) dz = dx + i eire dθ + dx + i eiϵe dθ → i − iπ C ϵ x 0 −r x π −∞ x as ϵ → 0, r → ∞. This approach allows to define the integral ∫ ∫ ∞ iut iut e def e dt = lim dt = πσ(u) → −∞ it ϵ 0 |t|>ϵ it

Alternatively, we may consider a more standard option ∫ ∞ eiut − eivt dt = π (σ(u) − σ(v)) −∞ it which entails ∫ ∞ ei(x−a)t − ei(x−b)t dt = 2π1I(a,b)(x) + π1I{a,b}(x) (3.1) −∞ it

7 As noted previously, the ch.f. of the uniform distribution on [−T,T ] sin T t υ (t) = → 1I{ }(t) as T → ∞. T T t 0 ∑ ∑ ≥ ∞ Hence, for a bounded discrete measure µ = n anδxn , i.e. for which an 0 and n an < , ∫ 1 T ∑ sin T (x − x ) −itxk n k lim µˆ(t)e = lim an = ak. T →∞ 2T T →∞ T (x − x ) T n n k

On the other hand, ∫ 1 ∞ e−itv ∑ µˆ(t) dt = anσ(xn − v), π ∞ it n whence, for u < v, ∫ ∞ −itu −itv ∑ 1 e − e σ(xn − u) − σ(xn − v) µ { u, v } µˆ(t) dt = an = µ(u, v) + . (3.2) 2π ∞ it 2 2 n

Since µ = µd + µc, with the pure discrete and pure continuous part, we may consider only the latter.

Theorem 3.1 (Inversion Theorem) Let µ be an atomless bounded measure. Then, for every a < b, ∫ 1 ∞ e−iat − e−itb µ(a, b) = µˆ(t) dt 2π −∞ it Proof. By Fubini’s theorem for improper integrals (Exercise 2) we rewrite the right hand side and use (3.1): ∫ ∫ ∫ ∞ 1 ∞ eit(x−a) − eit(x−b) ∞ dt = 1I(a,b)(x) µ(dx) = µ(a, b). −∞ 2π −∞ it −∞

Corollary 3.2 If φ =µ ˆ is integrable, then µ is absolutely continuous and its density f(x) is continuous: ∫ 1 ∞ f(x) = φ(t) e−itx dt. 2π −∞ Proof. Choose a = x, b = x+h in the theorem, divide by h, and let h → 0. The proof of continuity is left as an exercise.

8 4 CLT

4.1 The basic CLT

A ch.f. can be perceived as a path in the unit disk of the complex plane. The ch.f. eita of a point mass δa is a periodic circular path, visiting the point 1 infinitely often. So does the ch.f. of a discrete measure with finitely many co-rational atoms xn (that is, for some t, txn is an integer for every n). Periodicity will result from a lattice distribution of atoms (i.e., when they form an arithmetic sequence). Otherwise, only lim sup |φ(t)| = 1 is certain (Lukacs, 2.2). t→∞ On the other hand, lim |φ(t)| = 0 indicates an absolutely continuous (atomless) measure (ibid.) t→∞ It follows immediately that the existence of the k-th moment of µ entails the existence of the k-th derivative ofµ ˆ. The inverse implication is not quite simple and we will not go there (but see Exercise 3b).

For a complex valued function g ∈ Cn+1(R), Taylor’s theorem states that

∑n g(k)(0) g(x) = xk + R (x). k! n k=0 Among various versions of the remainder, we choose the integral form: ∫ x 1 (n+1) n Rn(x) = g (t)(x − t) dt n! 0 The formula is true under weaker assumptions. For example, it suffices to assume that the n-th derivative is absolutely continuous. Then the (n + 1)-the derivative exists in the Radon-Nikodym sense and is integrable on every bounded interval. However, we are interested only in the smooth function g(x) = eix, with a simple remainder that can be further refined by integrating by parts: ∫ ( ∫ ) n+1 x n x i it n n x 1 it n−1 Rn(x) = e (x − t) dt = i − + e (x − t) dt . n! 0 n! (n − 1)! 0 Hence we obtain two upper estimates that we merge to one: |x|n+1 2|x|n |R (x)| ≤ ∧ n (n + 1)! n! which is bounded by the second term for |x| > 2(n + 1).

n k Corollary 4.1 Let E |X| < ∞, φ denote the ch.f. of X, and mk = E X , k = 0, ..., n. Then

∑n ikm |tX|n+1 2|tX|n φ(t) = k tk + R (t), where |R (t)| ≤ E ∧ k! k k (n + 1)! n! k=1 E.g., for n = 2, t2 φ(t) = 1 + it E X − E X2 + R (t) 2 2

9 where |tX|3 |R (t)| ≤ E ∧ |tX|2. 2 6

1 We hardly ever need this “precision” with 6 . Let’s make it cruder and simpler:

2 2 |R2(t)| ≤ E |tX| (|tX| ∧ 1) ≤ cT E X (|X| ∧ 1), (4.1)

2 where |t| ≤ T and cT = T (T ∨ 1).

Note the immediate application that involves the standard Gaussian distribution N(0, 1):

1 2 2 with the density √ e−x /2 and the ch.f. e−t /2. (4.2) 2π

The Central Limit Theorem ∈ 2 2 Let Xn L be i.i.d. with ch.f. φ. W.l.o.g. we may and do assume that E Xn = 0 and E Xn = 1. Then X + ··· + X D Y = 1 √ n → N(0, 1) n n

Proof. The ch.f. of Yn can be estimated as follows: ( ) √ t2 √ n p = φ(t/ n)n = 1 − + R (t/ n , n 2n 2

That is, we can rewrite it a ( ) a n p = 1 − n , where a → a = t2/2 n n n

Given ϵ ∈ (0, a), we find n0 such that |an − a| < ϵ for every n > n0, so ( ) ( ) a + ϵ n a − ϵ n 1 − ≤ p ≤ 1 − n n n

That is, −a−ϵ −a+ϵ e ≤ lim inf pn ≤ lim sup pn ≤ e n n −t2/2 Hence limn pn = e .

10 4.2 Lindeberg-Feller Condition

The CLT is one of the basic examples of a limit theorem that establishes a limit distribution of a sequence of random variables (Sn), subject to affine transformations: S − b n n , an where an, bn are scalar sequences, and Sn may depend on observed data, expressed as random variables, and be their function (a.k.a. a statistic), e.g., the sum, maximum, minimum, etc.

For example, under moment assumptions, the centering scalar bn could be the mean while the scaling scalar an could be the standard deviation of the transformed variable. The independence assumption may be relaxed, the moment assumptions may be dropped, so the centering and scaling constants might not be related to moments at all.

Presently we consider a random array [ξnk : n ∈ N, k ≤ n] and denote

Sn = ξn1 + ··· + ξnn.

We assume that

(1) For every n, ξn1, . . . , ξnn are independent; ≤ 2 | |2 ∞ (2) For every n and k n, E ξnk = 0, σnk = E ξnk < ; ∑n (4.3) 2 | |2 2 (3) sn = Var(Sn) = E Sn = σnk = 1. k=1 Introduce also the Lindeberg-Feller condition ∑n [ ] 2 ∀ ϵ > 0 lim ℓn(ϵ) = 0, where ℓn(ϵ) = E ξ ; |ξnj| > ϵ (4.4) n nj j=1

Theorem 4.2 Let a triangular random matrix [ξnk] satisfy (4.3). Then,

2 → (a) max σnk 0, k (b) L(Sn) → N(0, 1). if and only if the Linderberg-Feller condition (4.4) is satisfied.

Proof. Assume (4.4). Then (a) follows since

2 ≤ E ξnk ϵ + ℓn(ϵ). ∑ Consider a Gaussian random matrix [ζnk] with all characteristics (4.3), and denote Zn = k ζnk.

Clearly, Zn ∼ N(0, 1). Let |t| ≤ T .

∏ ∏ ∑ itSn − itZn itξnk − itζnk ≤ itξnk − itζnk E e E e = E e E e E e E e , k k k

11 where the last inequality for products of complex numbers from the unit disk follows by induction. Continuing, the latter term is bounded by

∑ 2 2 ∑ 2 2 itξ t σ itζ t σ E e nk − 1 + nk + E e nk − 1 + nk . 2 2 k k Using the error estimate (4.1), the first of the above terms is bounded by ∑ | |2 ∧ | | ≤ 2 cT E ξnk (1 ξnk ) ϵ cT sn + cT ℓn(ϵ) k 2 Denoting the p’th absolute moment of a N(0, 1) Gaussian r.v. by mp, the p-th moment of N(0, σ ) Gaussian r.v. ζ equals p p p p E |ζ| = σ E |ζ/σ| = mp σ Hence, the second term is bounded by ∑ ∑ ∑ 2 3 3/2 1/2 cT E |ζnk| (1 ∧ |ζnk|) ≤ cT E |ζnk| = cT m3 E σ ≤ cT m3 max σ nk k nk k k k Now, let n → ∞, and then ϵ → 0.

To show that Lindeberg’s condition is necessary, assume (a) and (b), and fix ϵ > 0 and t > 0.

Assuming (b), the weak convergence of laws Sn to the symmetric normal law, we infer that ∑ t2 ln E cos tξ = ln ℜE eitSn → − . nk 2 k We claim that ( ) ∑ t2 E 1 − cos tξ → . (4.5) nk 2 k Indeed, let’s write ∑ ∑ ( ) bn = ln E cos tξnk + E 1 − cos tξnk . k k 2 Applying the inequality | ln z + 1 − z| ≤ |1 − z| to z = E cos tξnk, and next using the estimate 2 1 − cos u ≤ u /2 (with u = tξnk), we infer that ( )2 ∑ 2 ∑ t2ξ2 t4 max σ2 ∑ |b | ≤ E (1 − cos tξ ) ≤ E nk ≤ k nk σ2 → 0, n nk 2 4 nk k k k which proves (4.5), by virtue of the assumed condition (a). On the left hand side of (4.5), consider 2 a single term with ξ = ξnk. Then, since 1 − cos u ≤ u /2 and 1 − cos u ≤ 2,

E (1 − cos tξ) = E [1 − cos tξ; |ξ| ≤ ϵ] + E [1 − cos tξ; |ξ| > ϵ] t2 ≤ E [ξ2; |ξ| ≤ ϵ] + 2P(|ξ| > ϵ) 2 t2 2 ≤ E [ξ2; |ξ| ≤ ϵ] + E ξ2, 2 ϵ2

12 where the second term was estimated with the help of Chebyshev’s inequality. In other words, 2 4 E [ξ2; |ξ| ≤ ϵ] ≥ E (1 − cos tξ) − E ξ2 t2 ( t2ϵ2 ) 2 t2 4 = 1 + E (1 − cos tξ) − − E ξ2 t2 2 t2ϵ2 → ∞ 2 → Return to ξnk, sum up along j, and let n , Then, (4.5) and the normalizing condition sn 1 imply ∑ 2 4 lim inf E [ξ ; |ξnk| ≤ ϵ] ≥ 1 − . n nk t2ϵ2 k Although t was fixed, it is arbitrary. Now, let t → ∞,

Corollary 4.3 (Lyapunov) Let ξnk fulfill assumptions of Lindeberg’s theorem, and let δ > 0. Then [ ∑ ] [ ] 2+δ d λn(δ) = E |ξnk| → 0 ⇒ Sn → ζ k

δ Proof. Indeed, ℓn ≤ λn(δ)/ϵ , so Lyapunov’s condition implies Lindeberg’s.

2 We return to the sequence of independent random variables (ξk). As before, assume E ξ = 0, σk = 2 ∞ ··· 2 ··· 2 ≤ ≤ E ξ < , Sn = ξ1 + + ξn, sn = σ1 + + σn. Let Fn = P(Sn/sn x) and Φ(x) = P(ζ x). If w Fn → Φ, it is desirable to know how fast the convergence occur. That is, estimates of

dist (Fn, Φ) are of great practical and theoretical importance, where “dist” - preferably a metric - measures the convergence. Although the L´evy-Prokhorov metric seems to be the most natural choice since it metrizes the weak convergence of measures, its specific definition makes it difficult to examine. The uniform metric is stronger but more appropriate for applications. Of course, to obtain a stronger mode of convergence, a stronger assumption is needed. Let F (x) = P(ξ ≤ x),G(x) = P(η ≤ x) be probability distribution functions. Consider

dist(F,G) = ∥F − G∥∞ = sup |F (x) − G(x)|. x Theorem 4.4 (Berry (1941), Esseen (1945), Van Beek 1972) Assume E ξ = 0, E ξ2 = 1, 2+δ and E |ξ| < ∞, for some δ > 0. Let (ξk) be independent copies of ξ. Then E |ξ|2+δ ∥F − Φ∥∞ ≤ c √ , n n where c is some universal constant, independent of n and of distribution of ξ, although it may depend on δ.

No proof will be presented here (see Durrett).

13 4.3 Poisson convergence

Let [ξnk] be a triangular array of random variables:

(1) values are whole numbers 0, 1, 2, ...;

(2) for every n, ξnk are independent; (4.6) 0 (3) max ∥ξnk∥0 → 0 as n → ∞, where ∥ · ∥0 is any L -metric. k

−tX We are free to choose ∥X∥0 = E (1 ∧ |X|), or ∥X∥0 = E (1 − e ) for a fixed t > 0. ∑ As before, denote Sn = ξnk. We will discuss the weak convergence of its distribution to the k Poisson distribution on Z+: n λ it −t µ { n } = P(ξ = n) = e−λ , µˆ(t) = eλ(e −1), µ˜(t) = eλ(1−e ). n! Lemma 4.5 Assume (4.6). Then the following conditions are equivalent:

D 1. Sn → ξ ∑ −t 2. − ln ψnk(t) → λ(1 − e ), t > 0. k ∑ ( ) −t 3. Cn = 1 − ψnk(t) → λ(1 − e ), t > 0. k Proof. The equivalence of the first two conditions follows by the L´evyContinuity Theorem for Laplace transforms ψ(t) = E e−tX , after applying the logarithm.

For u ∈ [0, 1/2] we have the identity

− ln(1 − u) = u + r(u), where 0 ≤ r(u) ≤ u2. (4.7)

To show the equivalence of the second and third condition, we apply it with u = unk = 1 − ψnk, which is arbitrarily small by the third assumption in (4.6). Then we sum up along k. That is, either we assume (3), so Cn ≤ C, or we assume (2) which yields ∑ ∑ Cn = (1 − ψnk) ≤ − ln ψnk ≤ C. k k Hence the remainder is bounded by ∑ ∑ 2 rnk ≤ (1 − ψnk) ≤ C max ∥ξnk∥0 → 0, k k k

−tX where we choose ∥X∥0 = E (1 − e ).

14 Theorem 4.6 Let a random triangular array [ξnk] satisfy (4.6) and ξ be Poisson(λ). Then ∑  →  (a) P(ξnk > 1) 0  ∑ ∑k D ⇐⇒ Sn = ξnk → ξ →  (b) P(ξnk = 1) λ  k k D Proof. The necessity. Suppose that Sn → ξ ∼ Poisson(λ), and look at the third condition in −t Lemma 4.5. For simplicity, denote s = e . So, Cn → λ(1 − s). For a single random variable X with values in Z+ we have

E (1 − sX ) = E [1 − sX ; X > 0] = E [1 − s + s − sX ; X > 0] = (1 − s)P(X > 0) + R(s) where

R(s) = E [s − sX ; X > 0] = E [s − sX : X > 1], hence(s − s2)P(X > 1) ≤ R(s) ≤ sP(X > 1).

(because X > 1 ⇐⇒ X ≥ 2). Since ∑ ∑ Cn = (1 − s) P(Xnk > 0) + Rnk(s), where Rnk(0) = 0, k k thus ∑ ∑ ∑ P(Xnk > 0) → λ and Rnk(s) → 0, hence P (ξnk > 1) → 0 k k k so (b) and (a) hold true.

The sufficiency. First, we reduce the range of r.vs. to the mere { 0, 1 }. Then (a) is trivially ′ true. Suppose that [*] “(b) is sufficient for the Poisson convergence for 0-1 r.vs.” Denote ξnk =

1I{ξnk = 1}. ∑ ′ ′ ′ →D Assume (b), which is the same for both ξnk and ξnk. So, by [*] Sn = ξnk ξ. But k ∑ ′ Sn = Sn + Rn, where Rn = ξnk1I{ξnk>1}. k ∑ ∑ ∧ ≤ ∧ By (a), using the subadditivity 1 k ck k(1 ck), ∑ ∑ ∧ ≤ ∧ → E 1 Rn E 1 ξnk1I{ξnk>1} = P(ξnk > 1) 0, k k P D i.e., Rn → 0, so Sn → ξ.

Now, assume (b) for an 0-1 array, and compute the logarithm of the Laplace transform E exp { −tSn } , −t with the notation unk = 1 − E exp { −tξnk } = (1 − e )P(ξnk = 1), using (4.7): ∑ ∑ −t − ln(1 − unk) = (1 − e ) P(ξnk = 1) + Rn, where Rn → 0 k k

15 since ∑ ≤ 2 ≤ ∥ ∥ Rn unk C max ξnk 0. k k Thus, we obtain the Laplace transform of Poisson(λ) in the limit.

Remark 4.7 In elementary probability courses the special case of i.i.d. Bernoulli ξnk’s is known as the Poisson approximation of the Binomial. Indeed, in this case Sn is binomial, where it is also assumed that pn = P(Xn1 = 1) → 0, and then (b) means that npn → λ.

Exercise. What condition imposed on pn does ensure (or, is necessary and sufficient for) the Lindeberg-Feller condition. i.e. Gaussian rather than Poisson convergence? Note that the each entry ξnk needs to be standardized to fulfill the standing assumptions for the CLT for random arrays.

16 4.4 Exercises

1. Verify the relations “density vs. ch.f.” in Example 1.3 and formula (4.2).

2. Let (S, S, µ) be a bounded measure space and f(t, s) a measurable real or complex function on R × S. Assume that

• f(·, s) is locally (i.e., on every interval) integrable functions on R for almost every s ∈ S; • f(t, ·) is µ-integrable for almost every t ∈ R; ∫ ∫ T def • The improper integral g(s) = f(t, s) dt = lim f(t, s) dt exists for almost every →∞ R T −T s ∈ S; ∫ T

• sup f(t, s)dt is µ-integrable; T −T Then ∫ (∫ ) ∫ f(t, s) µ(ds) = g(s) µ(ds). R S S 3. A discrete version of the above theorem [Abel’s convergence criterion for infinite series]. Prove:

∑N ∑ ↘ ∞ If an 0 and sup bn < , then anbn converges. N n=1 n

Hint: write dn = an−1 −an and Bn = b1 +···+bn, and split the sum (discovering and proving Abel’s “summation by parts” formula):

∑N N∑−1 an bn = aN BN − dnBn. n=0 n=0 ∑ sin an ∑ cos an (a) Let p > 0. Show that converges for every real a, and converges for np np n n a∈ / 2πZ. ∑ C (b) Let µ = δ , where C makes probabilities of the sequence (n2 ln n)−1. The first n2 ln n n n moment does not exist butµ ˆ is differentiable for t ≠ 0. Prove also the same statement

when atoms { n } oscillate, i.e., replace δn by δ(−1)nn.

4. Show that the density in Corollary 3.2 is continuous. ∏ ∏ ∑ | − | ≤ | − | 5. On the unit disk of the complex plane, k zk k wk k zk wk .

6. Find the arbitrary n-th absolute moment E |ζ|n of a standard N(0, 1) r.v. ζ. Hint: in the first semester we evaluated even moments (while studying Marcinkiewicz-Zygmund-Paley inequalities).

17 7. What condition imposed on pn in the triangular matrix of Bernoulli r.vs. (i.i.d. in each row) will ensure (or be implied by) the Lindeberg-Feller condition. i.e. Gaussian rather than Poisson convergence?

18 5 Poisson random measure

In science and beyond the most typical activity is counting. Scientists and beyondists count every- thing, stars in sky sectors, pollutant particles in water or air, bird nests per area of Alabama (or Alaska), coins in collections, gold nuggets in mines, Burmese pythons in Everglades, Occupieds per city, customers in burger joints, votes in the GOP primary per Florida county, etc.

Typically, the count involves the number of items per region, may vary from 0 through all natural numbers, and there is no reason to assume that there is a definite upper bound. All the listed - and unlisted - examples involve measurable regions - linear, planar, spatial, etc. It stands to reason to suppose that the count depend more on the measurement (length, area, volume) than on other aspects like geometry or . Also, the counts in separate regions should be independent.

Of course, both assumptions are ideal but so are all human made models.

The randomness is entailed by the random distribution of items Yn - wether arrival moments on the temporal line, or scatter points in the plane or surface, or in the space or 3D-manifold, or just in an abstract set S. So, the count is ∑ N(A) = 1IA(Yn). (5.1) n

The formula can be viewed from the measure-theoretic point of view. Denoting by δa the atomic measure at a point a, we may write ∑

N = δYn , (5.2) n ≥ and then for f 0, ∫ ∑ Nf = f dN = f(Yn). S n

Let (S, S, λ) be a σ-finite continuous measure space and (Ω, F, P) be a probability space, both entailing the L0-spaces of measurable function. While L0(Ω) with convergence in P is metrizable ∫ ∧| | ∧| | by the traditional metric E (1 X ), the analogous metric S(1 f ) dλ yields a topology essentially stronger than the convergence in measure although weaker than L1. On L∞(S) it is L1, though.

A sequence of random elements Yn in S (i.e., measurable mappings Ω 7→ S) entails a random counting measure on S, a so called point process. It is not immediately clear whether the converse is true, that is, a random counting measure requires random points to be counted.

The counting random measure is just one example with the concept of a random measure, i.e. a mapping X : S → L0(Ω, F, P) such that ∪ ∑ X An = XAn,An ∈ S are disjoint. (5.3) n n

19 The series on the right should converge in probability and, a fortiori, the convergence must be unconditional, i.e., independent of permutations of the indices. The range might be a narrower subspace of L0 such as L1 or L2. Thus, a random measure is factually a vector measure which extends the classical concept of a nonnegative countably additive set function. For example, a signed measure is an R-valued vector measure.

A deterministic control measure is a very convenient tool:

XAn → 0 ⇐⇒ λAn → 0.

Then it would suffice to introduce the random measure on a generator S0. For example, when S = Rd, and S consists of Borel sets, with a control measure it suffices to define X on simple figures such as intervals.

20 5.1 Poisson measure and integral

The function x 7→ 1 ∧ x on the positive half-line can be replaced by another more convenient function. Below we shall use ψ(x) = 1 − e−x for reasons that will soon become clear. So, for random variables, the L0-metric is

∥X∥0 = E ψ(|X|).

The mapping ξ : S → L0(Ω, F, P) is called a Poisson random measure (PRM) if

1. ξA is Poisson(λA), for every A ∈ S of finite measure λ;

2. ξA and ξB are independent if A ∩ B = ∅.

We call λ the control measure of ξ. At this moment the issue of existence is not yet resolved but properties can be easily derived.

Proposition 5.1 Let ξ be a PRM with a control measure λ. Let A1,...,An be disjoint measurable sets of finite measure. Then ξA1, ··· , ξAn are independent, and their joint Laplace transform is { } { } ∑ ∑ ( ) −itk E exp − tk ξAk = exp − 1 − e λAk . k k Proof. The Laplace transform formula follows by induction and utilizes the property of Poisson distribution: the sum of two independent Poisson random variables is again Poisson, and the parameters add up.

First, we note that ξ is factually a countably additive function (in the sense to be explained) on ∪ S S ∈ S ∈ S the δ-ring of 0 subsets of of finite measure. Let A 0 and A = k Ak, where Ak 0 are disjoint. Then ∑ ∑ ∑ ( ∑ ) E ξA = λA = λAk = E ξAk = E ξAk, or E ξ(A) − ξAk = 0 k k k k (the r.v. in parentheses is nonnegative2). That is, ξ is countably additive as a mapping with values in L1(Ω, F, P).

Corollary 5.2 From the measure-theoretic point of view the Laplace transform formula appears as the assignments ∑ ∑ ∫ 7→ def f = tk 1IAk ξf = tk ξ(Ak) = f(s) ξ(ds). S k { ∫k ( ) } E exp { −ξf } = exp − 1 − e−f(s) λ(ds) S 2why?

21 The quantity entails a complete metric vector subspace of measurable functions on S. Recall ψ(x) = 1 − e−x, x ≥ 0. ∫

∥f∥0 = ψ(|f|) dλ S { } 0 L = f ∈ L (S): ∥f∥0 < ∞ , d(f, g) = ∥f − g∥0, ∫ where simple functions form a dense subset. Write λF = S F dλ. Then the last formula in the Corollary can be rewritten as E ψ(ξf) = ψ(λψ(f)).

In other words,

∥ξf∥0 = ψ(∥f∥0) (5.4) which establishes a homeomorphism between the space of simple f’s and their Poisson integrals.

Proposition 5.3 For a Poisson random measure ξ consider the positive cone L+ = { f ∈ L : f ≥ 0 }. Then the mapping ξf, defined originally for simple functions, extends to a continuous positive-linear L 0 F mapping from + into L+(Ω, , P), and (5.4) continues to hold

Then ξ extends to a continuous linear mapping on L, defined as ξf = ξf+ − ξf−.

Proof. Let f ∈ L+ and fn ≥ 0 be increasing simple measurable functions such that f = limn fn = ∥ − ∥ → supn fn and f fn 0 0. Clearly, the well defined ξfn increase a.s. and ξfn is a Cauchy sequence in L0. Indeed, for n ≥ m, by (5.4)

∥ξfn − ξfm∥0 = ψ(∥fn − fm∥0) → 0

So ξf = limn ξfn exists in probability and hence a.s. (since the sequence increases). Further, (5.4) is preserved in the limit, which ensures the other listed properties,

5.2 About stochastic processes

The definition of PRM contains a family of finite dimensional distributions

{ ∈ S } µA1,...,An : disjoint Ak .

Although it is easy to create a random vector (ξ1, . . . , ξk) with independent Poisson (λAk) compo- nents, the existence of a robust mapping ξ : S → L0 is not immediately obvious. It is a special case of a more general problem.

22 Let T be a nonempty set and X = (Xt : t ∈ T ) be a family of real random variables, a.k.a. stochastic process. By finite dimensional distributions (FDD) of X we understand the Borel probability measures ( ) L B Rn ∈ N ∈ µt1,...,tn = Xt1 ,...,Xtn on ( ), n , t1, . . . , tn T,

τ µτ in short, where τ = { t1, . . . , tn }. So, more precisely, µτ is a Borel measure on R . We notice the obvious relation for m > n ( ) ( ) ∈ ∈ ∈ R ∈ R ∈ ∈ ∈ B R P Xt1 A1,...,Xtn An,Xtn+1 ,...,Xtm = P Xt1 A1,...,Xtn An ,Ak ( )

In terms of probability measures, we say that their family is consistent. That is, for finite τ ′ ⊃ τ ( ) τ ′\τ τ µτ ′ A × R = µτ (A),A ∈ B(R ).

So, the passage from the family of random variables (Xt) to the consistent family of multidimen- sional probability distribution (µτ ) is immediate but the inverse implication is highly nontrivial, and is known as Kolmogorov Extension Theorem (cf., e.g. Theorem 6.16 in Kallenberg, or the special case in Appendix A7 in Durrett).

Even the existence of an infinite sequence of independent random variables belongs to this category.

However, we introduced a countable product measure in the first semester. That is, if (Ωk, Fk,Pk) is an infinite sequence of probability spaces, then there is a product measure P = P1 ⊗ P2 ⊗ · · · on ∏ (⊗, F) = ( Ωk, F1 ⊗ F2 ⊗ · · ·). k

In particular, if µk are Borel probability measures on R, then the well defined product measure

N N µ = µ1 ⊗ µ2 ⊗ · · · on (R , B(R )) entails independent random variables

N Xn(ω) = ωn, ω = (ωn) ∈ R .

Therefore, any constructive and intuitive approach should be appreciated.

23 5.3 Classical Poisson Process

There is one-to-one correspondence between increasing sequences yn on [0, ∞) and nondecreasing piecewise constant right continuous functions n(t) with unit jumps (CF - for “counting functions”) ∞ on [0, ): ∑ given yn ↗ put n(t) = 1I[0,t](yn); n th given a CF n(t) put yn = n jump of n(t); In other words, for every t ≥ 0 and n = 0, 1, 2, ...

n(t) ≥ n ⇐⇒ yn ≤ t, or n(t) = sup { k : yk ≤ t } . (5.5)

For any nonnegative Borel function f on [0, ∞), the Lebesgue-Stieltjes integral is well defined although it could be infinite ∫ ∞ ∑ nf = f(t) dn(t) = f(yn). 0 n Hence any increasing random sequence Yn entails the CF Nt and a random counting measure NA (5.1), and then the integral of a nonnegative function. Conversely, a counting random measure defines the random CF Nt, and its discontinuities define Yn’s. We shall call them signals.

If (Vn) are i.i.d. and Yn = V1 + ... + Vn, then the CF Nt defined by (5.5) is called a renewal process. The most important case involves the exponential distribution of the summands Vk.

Denote the parameter, also called the intensity, by λ. We will show that Nt induces a Poisson random measure with the scaled Lebesgue measure as a counting measure.

Proposition The r.v. Nt has the Poisson(λt) distribution.

Proof: Yn has the Gamma distribution with the density λn f (x) = xn−1 e−λx n (n − 1)!

Hence, by conditioning and since Yn+1 = Yn + Vn+1

P(Nt = n) = P(Nt ≥ n, Nt < n + 1) = P(Yn ≤ t, Yn + Vn+1 > t) ∫ t = P(Vn+1 > t − x)fn(x) dx 0 ∫ λn t (λt)n = e−λ(t−x)xn−1 e−λx dx = e−λt (n − 1)! 0 n!

We call Nt the Poisson process. Note that the name does not and should not apply to the sequence Yn, although it determines Nt. For that reason the terminology is often abused, and the sequence is improperly called “the Poisson process”.

24 Define the age time from the given moment to the last signal that precedes it, and the excess time from the moment t to the next signal:

− − At = t YNt ,Wt = YNt+1 t

In what follows the crucial role is played by the “lack of memory” of an exponential distribution:

P(U1 > t + s|U1 > t) = P(U1 > s).

Proposition. The excess time Wt is independent of Nt and L(Wt) = L(U1). Proof: Compute ∫ t P(Yn+1 ≥ t + s, Yn ≤ t) = P(Un+1 ≥ t + s − x) fn(x) dx 0 ∫ λn t (λt)n = e−λ(t+s−x) xn e−λx dx = e−λ(t+s) (n − 1)! 0 n!

Since { Nt = n } = { Yn ≤ t, Yn+1 ≥ t }, this yields ≥ ≤ ≥ | ≥ | P(Yn+1 t + s, Yn t) −λs P(Wt s Nt = n) = P(YNt+1 t + s Nt = n) = = e P(Nt = n) and so ∑ −λs P(Wt ≥ s) = P(Wt ≥ s|Nt = n)P(Nt = n) = e , n which also entails the independence: P(Wt ≥ s|Nt = n) = P(Wt ≥ s).

Corollary 5.4

1. The Poisson process starts afresh and independently after any time t.

More precisely, given t > 0, define i.i.d. exp(λ) r.vs:

′ − ′ − ′ − U1 = Wt = YNt+1 t, U2 = YNt+2 YNt+1, ...Uk = YNt+k YNt+k−1, ...

and also ′ ′ ··· ′ Yn = U1 + + Un. ′ ′ ≥ Then, by Proposition, U1 is independent of Nt, so it is independent of (Uk, k 2). Then, for m ≥ 2, ∑ ′ ≤ − ≤ | P(Uk uk, k = 2, . . . , m) = P(Yn+k Yn+k−1 uk, k = 2, . . . , m Nt = n)P(Nt = n) n

= P(Uk ≤ uk, k = 2, . . . , m) = P(U2 ≤ u2) ··· P(Um ≤ um)

25 2. N(t, t + s] = Nt+s − Nt is independent of Nt and is distributed as Ns: ∑ ∑ ′ N(t, t + s] = 1I(t,t+s](Yn) = 1I(t,t+s](Yn). n n In other words, the distribution of increments depends only on their durations not on their locations, and we often say that the process is stationary.

By induction, the independence holds true for any finite number of disjoint of increments.

3. Nt entails a Poisson measure that starts with N(a, b] = Nb − Na.

4. The age time At and excess time Wt have the same exp(λ) distribution.

Hence we encounter a paradox: if at time t > 0 the previous and the next signals are observed, then the epoch - the time distance between them - has the expectation twice as long than the average epoch between two arbitrary signals: ( ) ( ) − E UNt+1 = E YNt+1 YNt = E At + Wt = 2 E U1

Is it really a paradox?

5.4 Transformations of Poisson process

5.4.1 Nonhomogeneous Poisson process

Let ϕ : (0, ∞) → (0, ∞) be a strictly monotonic function with the inverse Λ = ϕ−1. We assume the strict monotonicity for the sake of clarity of presentation. Otherwise, for function that may be piecewise constant we would have to use the generalized inverse.

Given a Poisson process Nt with unit intensity, transform its Gamma-distributed signals Yn into

Zn = ϕ(Yn), and denote the new counting process by Mt or the counting measure by MA. That is ∑ ∑ ∑ MA = 1IA(Zn) = 1IA(ϕ(Yn)) = 1IΛ(A)(Yn) = NΛ(A). n n n

Hence MA1,...,MAn are independent when A1,...,An are disjoint, and M(A) is Poisson with parameter Λ(A). Notice that ΛA is a measure, e.g.

Λ(a, b] = |Λ(b) − Λ(b)|.

Thus M is a Poisson random measure on the range ϕ(0, ∞). If the measure Λ is absolutely continuous with respect to the Lebesgue measure, then denoting its density by λ(t), also called the intensity function we obtain ∫ ΛA = λ(t) dt. A Examples.

26 1. Poisson process often serves as a model of customer service. However, its original setup would require the 24/7 servicing, in contrast to the usual piecewise service periods as in banking hours 9-5 for example. So, we can use two-valued { 0, λ } intensity function, with hours as time units: ∑∞ λ(t) = λ 1I(9+24n,17+24n](t). n=0 The above “square wave” is just one example of a periodic intensity function. √ 2 2 2 2. Say, ϕ(t) = t , t > 0. Then signals are Y1 ,Y2 , .... Then Λ(t) = t and its intensity is 1 λ(t) = Λ′(t) = √ . 2 t

For a general power ϕ(t) = tp, λ(t) = t1/p−1/|p|. E.g., the transformation ϕ(t) = 1/t entails 2 the intensity λ(t) = 1/t , so with probability 1 the number of signals Zn in every half-line [a, ∞) with a < 0 is finite.

5.4.2 Reward or compound Poisson process

Write the Poisson process, Poisson measure or Poisson integral again: ∑ ∑ ∑ Nt = 1I[0,t](Yn),NA = 1IA(Yn), Nf = f(Yn). n n n

Let Rn be i.i.d. r.vs. (“rewards”), independent of N that replace unit size jumps by Rn’s. Define ad rewrite ∑ ∑ ∑ ∑Nt

Mt = Rn 1I[0,t](Yn) = Rn 1I{Yn≤t} = Rn 1I{Nt≥n} = Rn (5.6) n n n n=1 ∑ 0 with the convention n=1 = 0. We may also write ∑ Mf = Rn f(Yn). n Let us compute the Laplace transform using Fubini’s Theorem (subscripts at the expectations indicate the suitable integrals) and abbreviating R1 = R: { } ∑ ∏ −Mf − { − } E e = E N E R exp Rnf(Yn) = E N E R exp Rf(Yn) . n n Introducing the function − { − } g(x) = ln E R exp Rf(x) we obtain the formula (no need to use the subscript anymore) { } { } ∏ ∑ ∫ ∞ ( ) −Mf −Ng −g(x) E e = E exp { −g(Yn) } = E exp − g(Yn) = E e = exp − 1 − e dx n n 0

27 Now, removing the function g we arrive at the identity { ∫ ∞ ( ) } E e−Mf = exp − E 1 − e−Rf(x) dx . (5.7) 0 One more time, denote by µ the probability distribution of R, supported by [0, ∞), and let S = [0, ∞)2 with Borel sets and the product measure λ = µ⊗Leb (“Leb” of course denotes the Lebesgue measure). Also, define the positively linear operator

[0, ∞)2 ∋ (u, x) = s 7→ Lf(s) = u f(x).

Thus, finally we see that the “reward Poisson process” is factually identical (in regard to its FDD) D with a Poisson random measure ξ on the product space, Mf = ξT f: { ∫ ( ) } E e−Mf = exp − 1 − e−Lf(s) λ(ds) = E e−ξ Lf . S Example 5.5 Let us examine one more time formula (5.6)

∑Nt Mt = Rn n=1

An alternative name for Mt is a “compound Poisson process”.

1. Let Rn be i.i.d. Bernoulli with P(Rn = 1) = p. That is, with probability p a signal is recorded (or taken, or colored) while with probability 1 − p the signal is neglected (or left

out, or whitened out). Then Mt is a Poisson process with intensity pλ, a “thinned” Poisson process.

In other words, if X is a binomial r.v. with parameters n and p, bin(n, p) and then n is “randomized” by a Poisson random variable N independent of X, then b(N, p) is Poisson.

2. The remaining process with “rewards” 1−Rn is also Poisson with intensity (1−p)λ. Further, both processes are independent.

This property can be generalized to a finite decomposition of the unit (as in 1 = R +(1−R )). ∑ n n d ̸ To wit, let R = j=1 Rj, where RjRk = 0 for j = k, and Rj is Bernoulli with parameter

pj. We may think of a wheel-of-fortune like spinner, with slices marked by numbers or colors

j = 1, ..., d. Let (Rnj) be independent copies of (Rj). When a signal Yn of a Poisson process is recorded, the spinner is spun and the signal is marked by the outcome shown, one between 1 and d. We claim that the resulting process ∑ Mjf = Rnj f(Yn) n

are independent Poisson with parameters pjλ.

28 5.5 A few constructions of Poisson random measure

5.5.1 adding new atoms

Using the setting of formula (5.2), we look at the reward Poisson process in Example 5.5 as an extension of an already defined Poisson random measure on (S, S, λ) to a product space S × T , where (T, T , µ) is a probability space, and τn are i.i.d. random elements in T with probability distribution µ: ∑

M = δ(Yn,τn). n

We may think of τn’s as “marks”, that are not necessarily numbers. That is why this Poisson random measure (as we will see) is often called a marked Poisson process.

In the integral form, for a function F (t, y) = α(t)g(y) with separable variables ∑ MF = α(τn) f(Yn). n

So, the reward Poisson measure is just the special case of the marked Poisson measure, Rn = α(τn). For a general F , ∑ MF = F (τn,Yn). n It remains to verify that M is a Poisson random measure. ∏ −MF −F (τ,Yn) E e = E N E τ e . n Denote g(y) = − ln E F (τ, y).

So { } ∑ { ∫ ( ) } −MF −Ng −g(y) E e = E exp − g(Yn) = E e = exp − 1 − e λ(dy) n S { ∫ ∫ ( ) } = exp − 1 − e−F (t,y) µ(dt) λ(dy) . S T Hence M is a Poisson random measure on S × T with intensity λ ⊗ µ.

Example: A Poisson random measure in Rd. We shall use spherical coordinates (when d = 2 they are called polar coordinates) (r, t) where r ≥ 0 and t is a point from the (d − 1)-sphere

T = Sd−1 (e.g., the unit circle when d = 2, the two-dimensional unit sphere when d = 3, etc.). Let

Yn be signals of a unit intensity Poisson process on [0, ∞) and let independent τ n, also independent of N, be uniformly distributed on Sd−1. For a cone C described by r ≤ a, t ∈ B, where B is a

Borel subset of Sd−1,

1IC (r, t) = 1I[0,a](r) 1IB(t),

29 the Poisson random variable MC has the expectation a · |B| = Lebd(C) (where |B| denotes the normalized Lebesgue measure on the sphere). So, the intensity of M is the Lebesgue measure in Rd.

5.5.2 gluing the pieces

The last case of Example 5.5 can be generalized (and simplified at the same time) as follows. Let (S, S, µ) be a probability space and let X :Ω → S be a random element with distribution µ. That is, P(X ∈ A) = µA for A ∈ S. Let Xn be its independent copies and let N be a unit intensity

Poisson process with signals (Yn), independent of (Xn). Define

∑N1 ∑N1 ∑∞ ∑∞

MA = 1IA(Xn) or Mf = f(Xn) = f(Xn)1I{N1≥n} = f(Xn)1I[0,1](Yn), n=1 n=1 n=1 n=1 where A ∈ S or f ≥ 0 is a Borel measurable function on S. For simplicity, denote I = 1I[0,1]. Then ∏ −Mf −f(Xn)I(Yn) E e = E N E e . n Wit the help of the function g(y) = − ln E e−f(X)I(y). the latter formula reads { ∫ } ∏ ∑ ∞ ( ) −Mf −g(Y ) − g(Y ) −Ng −g(y) E e = E e n = E e n n = E e = exp − 1 − e dy . n 0

Removing g and bringing up I = 1I[0,1], the last expression { ∫ } { ∫ } ∞ ( ) 1 ( ) = exp − 1 − E e−f(X)I(y) dy = exp − 1 − E e−f(X) dy 0 0 { ∫ ( ) } = exp − 1 − e−f(s) µ(ds) . S In other words, M is a Poison measure on (S, S, µ).

Now let (S, S, λ) be an infinite but σ-finite measure space. Assume that is continuous (atomless). ∪ Let S = Sk, where Sk ∈ S are probability spaces. Create independent Poisson meaures Mk on S S S ∩ | (Sk, k, λk), where k = Sk and λk = λ Sk according to the previous construction. Finally, there comes the Poisson random measure with intensity λ: ∑ MA = Mk(A ∩ Sk). k

30 5.5.3 using a density of a random element

Let (S, S, λ) be an atomless infinite σ-finite measure space and τ be a random element in S whose distribution is absolutely continuous with respect to λ and its density p(s) is strictly positive. Let τn be independent copies of τ. Let Nt be a unit intensity Poisson process with signals Yn, independent of (τn). Finally, let A be a Borel set on [0, ∞) with Lebesgue measure 1. Put α = 1IA and define ∈ 0 the integral process for f L+(S) by the formula ∑ ξf = α (Yn p(τn)) f(τn). n Theorem 5.6 ξ is a Poisson measure on S with intensity λ.

Proof. Let us compute the Laplace transform ∏ −ξf −α(Yn p(τ)f(τ) E e = E N E τ e . n With the help of the function g(y) = − ln E e−α(y p(τ)) f(τ) and the identity 1 − e−αc = (1 − e−c)α, where α ∈ { 0, 1 }, we rewrite the latter expression as { } ∏ ∫ ∞ ( ) − − − E e g(Yn) = E e Ng = exp − 1 − e g(y) dy n 0 { ∫ ∞ ( ) } = exp − E 1 − e−α(y p(τ))f(τ) dy 0 { ∫ ∞ ∫ ( ) } = exp − 1 − e−f(s) α(y p(s)) p(s) λ(ds) dy 0 S

Using Fubini’s Theorem, in the “dy-integral” we substitute x = y p(s), so dx = p(s) dy, and since ∫ ∞ | | 0 α(x) dx = A = 1, the latter quantity becomes { ∫ ( ) } exp − 1 − e−f(s) λ(ds) . S That is, ξ is a Poisson measure with intensity λ.

Example. Let us construct a planar Poisson measure, for which we need a strictly positive densitiy.

E.g., we may pick the Gaussian density, for u = (u1, u2),

1 −(u2+u2)/2 1 ||u||2/2 p(u) = e 1 2 , q(u) = = 2π e 2π p(u)

31 so τn = (γn1, γn2), where γnk are independent N(0,1) random variables. Also, we choose A = [0, 1].

Let (Yn) form a unit intensity Poisson process Nt, independent of (τn). We observe that Vn = 2 ∥τn∥ /2 are exponential r.vs. with unit intensity. So we obtain ∑

ξf = 1I{Yn≤q(τn)} f(τn). n

5.6 Exercises

1. A Poisson random measure ξ is countably additive in every Lp, 0 < p < ∞

2. Show that the only solution of the functional Cauchy equation,

f(x + y) = f(x) + f(y), x, y ∈ R,

in the class of real continuous functions on R is the linear function f(x) = ax. Equivalently, within this class, the only solution of the functional equation

g(s + t) = g(s)g(t), s, t ≥ 0

is the exponential g(s) = eas. Hence the only continuous distribution that enjoys the lack of memory property is the exponential distribution.

3. Why do the age time At and the excess time Wt have the same distribution? Is this a property of Poisson process or any renewal process?

Find the probability distribution of UNt+1 = At + Wt for the Poisson process.

4. A Poisson process Nt on the positive half-line entails immediately a finite additive set function

N(a, b] = Nb − Na on the field spanned by the intervals (a, b]. Since its control measure is the Lebesgue measure times λ, show in few lines how this additive set function extends to a true random measure on Bore sets. Note: it is easier to construct the Poisson integral Nf first!

Then the random measure is simply N1IA. Clean details need to be written down.

5. In Corollary 5.4.4 a “paradox” is shown. Say, Auburn Transit buses arrive at a bus stop according to a Poisson distribution, say, with the average interarrival time 20 min. You come to the bus stop, there is no bus yet so you wait. How long, in average? 10 minutes, 15, 20? Yes, 20 is the answer. Also, the time between the moment of departure of the last bus before your arrival and the moment of your forthcoming ride would be... yes, 40 minutes, in average.

It’s a paradox, isn’t it? Or, perhaps not...

Similarly, if there are two lines to a service, say to a cash register or a ticket booth at a rock concert, and you choose one line, the other will move faster. So you’ll change the line. But

32 then the line that you just left will be mowing faster. That’s the fact and it has a logical explanation (the same phenomenon as in waiting for a bus).

Explain!

6. Show that the split Poisson processes Mj in Example 5.5.2 are independent Poisson with

parameters pjλ. Hint: for a fixed f show that, for Mj = Mjf, ∑ ∏ − c M − E e j j j = E e cj Mj j

Then, for finitely many fk with disjoint supports (so Nfk are independent): ∑ ∑ ∏ ∏ − c M f − E e j k j j k = E e cj Mj fk . j k

Argue that these relations prove the statement.

7. Let (S, S, λ) be an atomless (continuous) space. Let a ≤ λ(S). Then there exists A ∈ S such that λA = c. In particular, an infinite σ-finite measure space enjoys a partition into the union of probability spaces.

8. Let (S, S, µ) be a probability space. Consider the standard probability space (Ω, F, P) as the unit interval with Borel sets and the Lebesgue measure. Argue that there exists a measurable mapping X :Ω → S such that P(X ∈ A) = µA for every A ∈ S.

33 5.7 Non-positive awards

Let (ζn) be i.i.d. and copies of a ζ with distribution µ, independent of a Poisson process Nt with signals (Yn) and intensity λ. The integral ∑ Xf = ζn f(Yn) n is well defined, e.g., when f has a bounded support, e.g., for f = 1I(a,b] and linear combinations of such functions, say,

∑n ∑ ··· f = a01I{ 0 } + ak1I[tk−1,tk] = afk, 0 = t0 < t1 < tn = t k=1 k Then its ch.f. equals

{ ∫ ∞ ( ) } { ∫ ∞ ∫ ( ) } E eiXf = exp −λ E 1 − eiζf(t) dt = exp −λ 1 − eixf(t) µ(dx) dt . 0 0 R For the specific simple function listed above, it equals { } ∑ ∫ ( ) ∏ iakf(t) iakfk exp −λ (tk − tk−1) 1 − e µ(dx) = E e , R k k which shows that X is an independently scattered random measures with stationary increments.

Therefore, its FDD are fully described by one dimensional distributions, for f = 1I[0,t] { ∫ ( ) } E eiaXt = exp −λt 1 − eiax µ(dx) R

By the “gluing technique”, the introduced concept of a random measure can be extended even to infinite but σ-finite measure µ on R, restricted by the existence of the integral that appears in the characteristic function. Clearly, a sufficient condition is ∫ ∫ x2 µ(dx) + µ([−1, 1]c) = 1 ∧ x2 µ(dx) < ∞. |x|≤1 R

The finiteness of the first term on the left is obviously necessary. It can be shown that the second term must be finite necessarily but it requires some tedious reasoning, and we will not show it.

However, we will show details in the symmetric case.

Let’s begin with the simplest case of symmetric ±1-valued rewards. Let ξ be a Poisson random measure counting random points Yn in (S, S, λ), and let εn be a Rademacher sequence independent of ξ (and of (Yn)). Define ∑ ξf˜ = εn f(Yn). n

34 By Fubini’ theorem and properties of Rademacher series: ∑ ∑ ⇐⇒ 2 ∞ εnan converges an < , n n the series converges in probability, or, equivalently a.s., if and only if ∑ 2 2 ξf = |f(Yn)| < ∞ n and this happens if and only if ∫ ( ) 2 1 − e−f (s) λ(ds) < ∞. S Observe that we do not need to restrict ourselves to nonnegative functions (or differences of such). Instead of Laplace transforms we rather use the characteristic functions. Because of symmetry, the ch.f. is real and equals { ∫ ( ) } ˜ E eiξf = exp − 1 − cos f(s) λ(ds) . S We shall call ξ˜ a symmetrized Poisson random measure (SPRM) with intensity λ. The above existence condition can be replaced by a more elegant condition: ∫ 1 ∧ f 2(s) λ(ds) < ∞. (5.8) S Now, we will examine some of previously discussed variants in this new context.

D Symmetric rewards. Let Rn be independent copies of a symmetric r.v. R, i.e. R = −R. D Therefore, R = ε R, where ε and R are independent. Assume also that (Rn) is independent of the Poisson measure ξ. As before, put ∑ Mf = Rn f(Yn), n where the series converges if and only if ∫ E 1 ∧ R2f 2(s) ds < ∞. S The ch.f. E eiMf equals { ∫ ( ) } { ∫ ∫ ( ) } exp − E 1 − cos R f(s) λ(ds) = exp − 1 − cos x f(s) µ(dx) λ(ds) (5.9) S S R So M is a SPRM on S × R with intensity λ ⊗ µ, where µ = L(R). In fact

D Mf = ξLf, where L : S × R → R,L(s, x) = x f(s). (5.10)

We observe that the restriction to a probability or even finite measure µ is not necessary. A potential extension is controlled by condition (5.8). Consider a standard Poisson process on S = R+ with unit intensity. Let µ be a measure on R whose properties need to be found and let ξ be a PRM on R × [0, ∞) with intensity µ ⊗ Leb.

35 Lemma 5.7 The inner integral in (5.9) is finite over the class of functions f that contains indi- cators iff ∫ 1 ∧ x2 µ(dx) < ∞ (5.11) R Proof. The statement follows from (5.8) and the inequalities

(a2 ∧ 1) (1 ∧ x2) ≤ 1 ∧ (ax)2 ≤ (a2 ∨ 1) (1 ∧ x2)

Definition. A Borel measure µ on R is called a L´evymeasure if (5.11) holds.

Thus, a PRM ξ on R × R+ with intensity dµ ⊗ dt, where µ is a L´evymeasure, entails a process Mf by (5.10) with the ch.f.

{ ∫ ∞ ∫ ( ) } E eiMf = exp − 1 − cos x f(s) µ(dx) ds 0 R

In particular, for functions f1, . . . , fn with disjoint supports, Mf1, . . . , Mfn are independent. Hence, ≥ ··· if fk = 1I(a+tk−1,a+tk], k = 1, . . . , n, where a 0 and t0 = 0 < t1 < tn, { } ∑ ∏ { ∫ } E exp −i ckMfk = exp −(tk − tk−1) (1 − cos x) µ(dx) . R k k

In other words, the stochastic process Mt = M1I[0,t] has independent and stationary increments.

36 5.8 SSα - symmetric α-stable processes

3 For α > 0 define the symmetric measure µα by the formula 1 µ [x, ∞) = , x > 0 α xα

Equivalently, µα has the density α g (x) = , x ≠ 0. α |x|α+1

Lemma 5.8 µα is a L´evymeasure if and only if α < 2.

Put S = R \{ 0 }. Consider the Poisson measure M on S × R+. That is, { ∫ ∞ ∫ ( ) } iMf − − α E e = exp 1 cos xf(t) α+1 dx dt 0 S |x|

By symmetry we may consider the integral for x > 0, and then change4 the variable x|f(t)| 7→ x, so the ch.f. equals

{ ∫ ∞ } ∫ ∞ − | |α − dx exp cα f(t) dt , where cα = 2α (1 cos x) α+1 . 0 0 x

−1/α In particular, taking f = cα 1I(a,b],

− − − | |α E eit(Mb Ma) = e (b a) t .

Definition A random variable X with the ch.f.

α E eitX = e−a|t| is called symmetric α-stable, or SSα in short. Mf, MA, Mt are called then SSα integral, measure, process - respectively.

−1/α Also, taking fk = cα 1IAk , where Ak are disjoint of unit Lebesgue measure ( ) ∑ ∑ 1/α D α Xk = Mfk, k ∈ N ⇒ fk are i.i.d. and akfk = |ak| X1. k k

Example 5.9 (Le Page representation) Let Yn be Poisson points with unit intensity, τn be i.i.d. uniform on [0, 1], εn be Rademacher r.vs., and the three sequences be independent. Let α ∈ (0, 2). Then ∑ f(τ ) Mf = ε n n 1/α n Yn

3µ(A) = µ(−A) 4the cosine is an even function

37 is a SSα process/integral/measure.

Indeed, even in a more general case ∑ ∑ 2 2 Mf = εn f(τn) ϕ(Yn) converges iff f (τn) ϕ (Yn) < ∞ a.s. n n and the necessary and sufficient condition is ∫ ∫ ∞ 1 1 ∧ f 2(t) ϕ2(y) dt dy < ∞. 0 0

The ch.f. equals { ∫ ∫ } ∞ 1 ( ) E eiMf = exp − 1 − cos f(t) ϕ(y) dt dy . 0 0 Returning to the original function ϕ(y) = y−1/α, using Fubini’s theorem and the substitution y = |f(t)|αx−α, we obtain { ∫ ∫ } { ∫ } 1 ∞ ( ) 1 iMf −1/α α E e = exp − 1 − cos f(t) y dt dy = exp −cα |f(t)| dt 0 0 0 with ∫ ∞ 1 − cos x cα = α 1+α dx. 0 x

38 5.9 Exercises

1. Let Nt be a standard Poisson process with arrivals Yn and A be a bounded Borel set. Show

that P(Yn ∈ A for infinitely many n) = 0. Deduce then that f(Yn) = 0 eventually with probability 1 when f has a bounded support.

2. Verify that the symmetric measure µ on R \ 0 with the tail µ(x, ∞) = x−α is a L´evymeasure iff α ∈ (0, 2).

3. Show that the p the moment E |X|p of a SSα r.v. is finite iff p < α.

4. Let Xk be i.i.d. SSα. Let p ∈ [0, α). Show that ∑ ∑ 0 α akXk converges in L and a.s. ⇐⇒ |ak| < ∞. k k

In particular, for p ∈ (0, α)

∑ ∥ ∥ akXk = a α, k p i.e., every F-space (for α < 1) or Banach space (for α ∈ (1, 2)) contains a subspace isometric with ℓα. That it is true also for α = 2 was proved previously (in lieu of stable we can use Rademacher or Gaussian i.i.d. r.vs.)

5. (added here although it belongs to the previous topic). Consider the paraboloid of revolution given by the equation z = x2 + y2. Project the disk of radius r that lies on the xy-plane

to the paraboloid’s surface, obtaining a set A. Let Yn be Poisson points on the paraboloid, controlled by the surface area. Find the probability that A has no Poisson points.

More difficult: Construct Poisson points on the paraboloid.

More difficult: Let S be a smooth connected unbounded surface, say, given by a parametric equation r = r(u, v), where (u, v) ∈ D, where D is an open domain in R2, and r ∈ C1. Construct Poisson points on S.

(Hint: Show that w.l.o.g D = R2. Construct Poisson points on the plane. Carry them by some mapping into S. The Jacobian will be involved.).

39 6 Infinitely divisible distributions

6.1 Preliminaria

Recall that the ch.funs. φ1, . . . , φn of independent r.vs X1,...,Xn satisfy the formula { } ∑ E exp i tkXk = φ1(t1) ··· φ(tn) k

(which is also sufficient for independence). For just two independent r.vs. X1,X2 that make the sum X = X1 + X2 we have, in terms of their probability laws µ1, µ2, and µ: ∫ ∫

µA = µ1(A − x) µ2(dx) = µ2(A − x) µ1(dx), R R which can be equivalently stated (Exercise: Prove it) in terms of the integrals ∫ ∫ ∫

µF = F (x) µ(dx) = E F (X) = µ1F (· + y) µ2(dy) = µ2F (· + x) µ1(dx). R R R The ”dot” inside indicates the integration along a hidden variable, e.g.: ∫

µ1F (· + y) = F (x + y) µ1(dx). R

When both partial measures are absolutely continuous, and f1, f2 denote their densities, then the density of the sum equals ∫ ∫ − − fX1+X2 (z) = f1(z y) f2(y) dy = f2(z x) f1(x) dx. R R The operation produces a new measure or a new density which is called the convolution of measures or densities, and denoted by µ1 ∗ µ2 or f1 ∗ f2. The extension to any finite number of ∗n terms follows immediately. If µ1 = ··· = µn = µ we may write µ = µ1 ∗ · · · ∗ µn. In the language of random variables the convolution n-th power is the probability law of the sum X1 + ··· + Xn of i.i.d. r.vs. with L(X1) = µ.

Now, let us look at this pattern from an opposite point of view. Let L(X) = ν andν ˆ = ψ. Is it possible to write

X = X1 + ··· + Xn, where Xk are i.i.d rvs.? Equivalently, does there exist a ch.f. ϕ such that ψ = ϕn? In other words, is ψ1/n a ch.f.? 5. If the measure is supported by the positive halfline, we may use the Laplace transform in lie of the ch.f. So, if L is a Laplace transform of a probability measure, is L1/n such, too?

First, we will find a counterexample. Suppose that ψ is ID. Then so is ψ and consequently, |ψ|2 is ID. The limit of ch.f. ϕ = lim |ψ|2/n n

5It doesn’t matter which root of n complex roots we consider; for simplicity we choose the principal root

40 takes only two values, 0 and 1. Since |ψ| > 0 on a neighborhood of 0, hence ϕ = 1 on that neighborhood, and as a continuous function must equal 1 everywhere. Hence ψ ≠ 0 everywhere. Let us repeat:

An ID ch.f. never vanishes.

Thus, as an example of a non-ID distribution it suffices to take one with a ch.f. vanishing at some point. For example, consider the uniform r.v. on [0, 1]. Its ch.f.

eit − 1 it vanishes for t = 2nπ, n ∈ Z.

Also the “tent function” that is a ch.f. by Polya criterion is not ID.

6.2 A few theorems

We note that the class of ID distributions is closed under convolution (Cf. an exercise).

Theorem 6.1 The class of ID probablity distributions is closed under the weak limit.

2 Proof. Let φk be ID and φk → φ0. Let n ∈ N. Then |φk| are real ch.funs. for k = 0, 1, 2, .. and 2/n ID for k = 1, 2, .... So, in the latter case |φk| are ch.funs. But

2/n 2/n |φ0| = lim |φk| , k

2/n and is continuous at 0, so by the continuity theorem |φ0| is a ch.fun. for every n ∈ N. That is, 2 |φ0| is ID. As such, it has no zeros, but then φ0 has no zeros, and we can thus define its n-th root as well as the n-th roots of φk’s: { } { } 1/n 1 1 1/n ∈ N φ0 (t) = exp ln φ0(t) = lim exp ln φk(t) = lim φ , n . n k n k k

1/n That is, φ0 is continuous at 0 and is the limit of ch.funs., so itself it is a ch.fun.

Corollary 6.2 Let ϕ be a ID ch.f. and α > 0. Then ϕα is ID.

Proof. For a rational α the statement is obvious. Then we pass to any real limit.

If α is irrational then the latter property is hard, if not impossible, to express in the language of real variables or probability distribution

41 6.3 A side trip: decomposable distributions

A probability distribution µ is called decomposable if there are nontrivial probability distributions

µ1, µ2 such that µ = µ1 ∗ µ2. In the language of random variables, X is decomposable, if there exist independent X1 and X2, none degenerate, such that

X = X1 + X2

We exclude degenerate r.vs. from the class of decomposable ones to avoid the triviality:

X = (X − a) + a.

Note that the decomposability may have a finite depth, that is, some of the summands may be non-decomposable. Even if the law can be split into an arbitrary finite number of parts, these might not be identical.

Call a r.v. X self-similar or c-decomposable, if

D X = cX + R, where X and R are independent, and R is non-degenerate. It follows by iteration, that X can be decomposed into a sum of any length that consists of independent summands: ∑n D D ′ 2 ′ D D n k−1 X = cX + R = c(cX + R ) + R = c X + cR + R = ··· = c X + c Rk, k=1 where all r.vs on the right side are independent and Rk’s are copies of each other. In particular, if X is c-decomposable, then it is cn-decomposable for every n ∈ N. We observe that this an attribute of the probability distribution or its transform rather than of the random variable itself. That is, the property reads, for µ = L(X), with µc = L(cX) and φ =µ ˆ: φ(t) ∃ ν µ = µ ∗ ν or is a ch.fun. c φ(ct) We note that SSα and Gaussian distributions are c-decomposable for every c ∈ (−1, 1).

The uniform random variable V on [−1, 1] is 1/k-decomposable, for every natural number k. Indeed, since its Fourier transform is sin t/t, then ( ) sin kt 1 sin(k − 1)t 1 = 1 − cos t + cos(k − 1)t, k sin t k (k − 1) sin t k i.e., ( ) ( ) D 1 D V = V + R ,R = D − R − + ε − + 1 − D (k − 1)ε , k k k 1 1/k k 1 k 1 1/k k where D1/k are (1/k)-Bernoulli, εk are Rademacher variables (i.e., (1 + εk)/2 are 1/2-Bernoulli), and all sequences are independent. Therefore, all uniform random variables are 1/k-decomposable,

42 b−a b+a being affine transformations of each other, U[a,b] = 2 U[−1,1] + 2 . In general, if a variable with D possible negative values (or bounded away from 0) has the property Y = cY + R, where the residue R has an atom at the minimum, P(R = m(R)) > 0, then Y − m(Y ) is c-decomposable for some c. In the uniform case, there is a simpler direct argument.

Proposition 6.3 A uniform random variable U on [0, 1] belongs to the class S(c) if and only if c = p = m−1, for some natural number m. In this case D 1 U = U + D · Z m m m where Dm denotes a (1 − 1/m)-Bernoulli r.v. and Zm has the discrete uniform distribution − 1 m∑1 δ m − 1 k/m k=1 on { k/m : k = 1, . . . , m − 1 }. The three variables U, Dm,Zm are independent.

Proof. Consider the binary series representation of U: ∑∞ D D 1 1 U = n i.e. U = U + D , 2n+1 2 2 n=0 where Dn are i.i.d. (1/2)-Bernoulli. Other admissible parameters c come from the equation 1 − e−s M(s) = L(s)/L(cs) = a = p + (1 − p)H(s). 1 − e−cs Clearly, c = p. Hence, the sought-for Laplace transform H(s) would be equal to p e−ps − e−s . 1 − p 1 − e−ps

Then, denoting the Dirac delta measure at point c by δc, we have ( ) ( ) ∞ ∞ e−ps ∑ e−s ∑ = L δ , = L δ 1 − e−ps pk 1 − e−ps pk+1 k=1 k=0 Thus, the signed measure whose H(s) is the Laplace transform is nonnegative, ( ) ∞ ∞ p ∑ ∑ L−1(H) = δ − δ , 1 − p pk pk+1 k=1 k=0 if and only if p = 1/m, for some m = 1, 2, .... Thus, H(s) is the Laplace transform of the uniform discrete probability on { k/m, k = 1, . . . , m − 1 }.

Example 6.4 The replacement of the parameter c = 1/2 by 1/3 yields a c-decomposable variable with the singular Cantor-Lebesgue distribution: ∑∞ 2D D 1 2 V = n , i.e. V = V + D . 3n+1 3 3 n=0 Whether its decomposability semigroup extends beyond { 1/3n } is yet to be determined.

43 6.4 ID of Poisson type

The inspiration: Section 5.4 in Lukacs’ book. { } ita Let Nλ be a Poisson r.v. with intensity λ and a > 0. Then the ch.fun. of aN is exp λe − 1 . A Poisson integral of a simple function is said to be of Poisson type in the literature. In other words, a r.v. is of Poisson type, if for a finite choice of parameters ak, λk > 0, and independent Poisson r.vs. N λk ∑ ∫

X = akNλk = Ng = gdN, k S ∑ where g = k ak1IAk and Ak are disjoint with λAk = λk. We carry the name to probability distributions and ch.funs. So, a ch.fun. ψ is of Poisson type iff { } ∏ { ( )} ∑ ( ) itak itak ψ(t) = exp ak e − 1 = exp p pk e − 1 (6.1) k k ∑ ∑ where p = k ak and pk = ak/p make a discrete probability distribution µ = k pkδak . In other words, a Poisson type ch.fun. ψ, obtained from φ =µ ˆ by the formula

ψ = exp { p (φ − 1) } , (6.2) is an ID ch.fun.

Lemma 6.5 Every (6.2) is ID.

Proof. Let φ be a ch.fun. Let p > 0 and n > p. Then the power of the convex combination (( ) ) p p n ψ = 1 − + φ n n n is a ch.fun., so is its limit as n → ∞. Let us repeat (*): (6.2) is a ch.fun. for every p > 0.

However, the limit is equal to the right hand side of (6.2). We must see that ψ1/m is a ch.fun. for every m ∈ N. But (( ) ) p p k ψ1/m = lim ψ1/m = lim 1 − + φ = exp { (p/m)(φ − 1) } k→∞ km k→∞ km km which is a ch.fun. by (*).

Proposition 6.6 (De Finetti’s Theorem) A ch.fun. is ID iff it has the form

ψ(t) = lim exp { pm (φm(t) − 1) } (6.3) m

44 Proof. The sufficiency follows by the continuity theorem. To prove the necessity, let ψ be ID. Then ψα is ID for every α > 0. Hence { } 1 exp (ψα − 1) α is an ID ch.fun. by the preceding argument. So is ψ, obtained as the limit for α → 0. Now, choose 1/m α = 1/m, pm = m, and φm = ψ . That is, ψ can be represented as the desired limit.

Now, we will see that t suffices to consider only the Poisson types among the above ϕm’s.

Theorem 6.7 A ch.fun. ψ is ID iff ψ in (6.3) involves only (6.1).

Proof. The sufficiency follows from the continuity (or De Finetti’s) theorem. Let ψ be an ID ch.fun. and consider its form (6.3), ensured by De Finetti’s Theorem, with φm =µ ˆm. We choose w discrete µmk → µm as k → ∞. That is, φmk =µ ˆmk → φm.

Then the statement follows by the diagonal argument.

6.5 L´evy-Khinchin formula

This is inspired by the presentation in Loe`eve’s book, Section 22.1. However, the original approach that used analysis, Riemann-Stieltjes integrals, etc., in a great detail and carefulness, has been “translated” into the language of Poisson integrals with the help of our sufficient background in measure theory.

Recall that a L´evymeasure M on R is defined by the condition ∫ 1 ∧ |x|2 M(dx) < ∞. R

In this condition we may replace the function 1∧|x|2 by any bounded monotonic continuous function that behaves like |x|2 near 0. So, as the defining condition we may prefer ∫ 2 x ∞ 2 M(dx) < , R 1 + x or in other words, that the measure x2/(1 + x2) M(dx) is finite, or a probability up to a positive scalar multiplier. This probability µ follows the formula

1 + x2 c µ(dx) = M(dx), (6.4) x2 for some c > 0. For a fixed t let us examine the behavior of the function itx f(x) = eitx − 1 − . 1 + x2

45 We see that f(x) is bounded away from 0, and t2x2 itx t2x2 f(x) ≈ itx − − ≈ − , |x| → 0. 2 1 + x2 2 Hence, for every L´evymeasure M ∫ f(x)M(dx) R exists, and thus the continuous function { ∫ ( ) } itx − − itx ψ(t) = exp iat + e 1 2 M(dx) (6.5) R 1 + x is well defined, and ψ(0) = 1. Using (6.4), we rewrite it: { ∫ ( ) } 2 itx − − itx 1 + x ψ(t) = exp iat + c e 1 2 2 µ(dx) R 1 + x x ∑ Consider discrete µm = k pmkδxmk of finite supports without 0 that converge to µ. Then ( ) ∑ 2 itxmk 1 + x lnµ ˆ (t) = iat + c p eitxmk − 1 − mk m mk 1 + x2 x2 k mk mk ∑ 2 ( ) ∑ 1 + x pmk mk itxmk − − = c pmk 2 e 1 itc + iat x xmk k mk k = φm(t) − itam, where φm are of Poisson type (6.1). Thus µm are ID ch.fun. and so is ψ.

Let us denote the ch.f. given by (6.5) by (a, M).

Proposition 6.8 ((a,M)-uniqueness Theorem) The pair (a, M) is unique, i.e. (a, M) = (a′,M ′) implies that a = a′ and M = M ′.

Proof. Notice that the functions iat and the one defined by the integral are linearly independent. Hence (a, M) = (a′,M ′) implies that a = a′. Assume that a = 0 and ln ψ has two integral representations with M and M ′, or, equivalently with two corresponding probability measures µ and µ′.

We need to show that the function ∫ ( ) itx − − itx ϕ(t) = e 1 2 M(dx) R 1 + x uniquely determines the probability µ. To this end, scale the variable, integrate, use Fubini’s theorem, and exponentiate: { ∫ } { ∫ } { ∫ ∫ } 1 1 u exp ϕ(ut) dt = exp (cos ux − 1) M(dx) = E exp (cos xv − 1) dv M(dx) . 2 −1 R 0 R

This entails the distribution of a Poisson random measure on R+ × R with intensity Leb × M. It remains to see that M is unique.

46 Proposition 6.9 ((a,M)-convergence Theorem)

(an,Mn) → (a, M) iff an → a and Mn → M weakly (which can be expressed in terms of the convergence of the corresponding measures µn).

Further, if (an,Mn) → ψ continuous at the origin, then ψ = (a, M) for some real a and some L´evy measure M.

Proof. The sufficiency is obvious. The necessity will follow by the previous approach. We infer that ϕn → ϕ implies that the distributions of the Poisson measures with mean Leb ⊗ Mn converge weakly to the distribution of a Poisson measure with mean Leb ⊗ M. Hence Mn → M.

Proposition 6.10 (L´evy-Khinchin) Every ID ch.fun. has the unique representation (6.5).

Proof. As we have seen, (6.5) entails ID. Now, let ψ be an ID ch.fun, so is ψ1/n for every n. Let the latter be a ch.fun. of some probability µn. Thus { ( )} { ∫ ( ) } 1/n itx ψ(t) = lim exp n ψ (t) − 1) = lim exp e − 1 nµn(dx) n n R

Rewrite the integral in the exponent: ∫ ∫ ( ) 2 2 nx itx − − itx 1 + x · x it 2 µn(dx) + e 1 2 2 2 nµn(dx) = ln(an,Mn) R 1 + x R 1 + x x 1 + x

By the convergence theorem an → a, Mn → M. So, ψ = (a, M).

47 6.6 Exercises

1. Find an example of a r.v. or probability distribution that is not ID although its ch.f. does not vanish ever. Hint: try simple discrete (even two-valued) r.vs.

2. Examples of ID laws: Normal, Poisson, stable, exponential, gamma. Prove (or just ob-

serve): If a one-parameter family of ch.funs. is of the form φ(t) = φθ(t) = exp { −θp(t) } , where θ may vary through R or [0, ∞), then φ is ID.

3. Convolutions of ID are ID. If φ is ID so is |φ|.

4. Let V be exponential, so it is ID. Is V α (so called, Weibull r.v.) ID?

5. Show that a discrete distribution is not ID. It might be not even decomposable! Find an example (e.g., the binomial distribution is decomposable but...).

6. Let X be c-decomposable. Prove that necessarily |c| < 1. Infer from this that every c- decomposable r.v. can be written as ∑∞ D n X = Rnc , n=0

where Rn are i.i.d., and so this provides a large class of examples. ∑

7. Let X = k akNλk , where ak, λk > 0, Nλk are independent Poisson, and the infinite series converges in distribution. Show that it converges a.s.

8. In the proof of Theorem 6.7 the “diagonal argument” was used. Write precisely all of its

details (beginning with “Let ϵ > 0...”) Hint: Let (xmk) be a matrix of elements of a metric

space such that xm = limk xmk exists for every m, and also x = limm xm exists. Then there

is a subsequence km such that limm xm km = x. Although the statement in the theorem involves a pointwise convergence of functions, which is not metrizable in general, the metric convergence must be used somehow.

9. Prove that the function defined in (6.5) is continuous.

10. Prove that the functions iat and the one defined by the integral in (6.5) are linearly indepen- dent.

11. At the end of the uniqueness and convergence theorems for (a, M) there were three statements.

48 (a) First, w.l.o.g., we may assume that an atomless L´evymeasure M is a probability mea- ∑ sure. In fact, M = k Mk, where Mk are probabilities. Examine the details in both statements. ′ ′ (b) Let ξ, ξ be Poisson measures on R+ × R with intensities Leb ⊗ M and Leb ⊗ M . If D ξ = ξ′, then M = M ′. Prove it.

(c) If the distributions of the Poisson measures with mean Leb ⊗ Mn converge weakly to the

distribution of a Poisson measure with mean Leb ⊗ M, then Mn → M. Prove it.

(d) Let the distributions of the Poisson measures with mean Leb ⊗ Mn converge weakly to some distribution. Then it must be of some random Poisson measure with mean Leb⊗M.

49