Subject - Paper - I Module - Distribution Functions II

Distribution functions were defined in the previous module. In this module, we consider a few more of their properties. We begin with an example.

Example 1 Consider the distribution function  0, if x < 0  F (x) = 1 − p, if 0 ≤ x < 1 1, if x ≥ 1.

For definiteness, take p = 0.6. The figure below shows the graph of F (x) against x.

As evident, there is a jump discontinuity at x = 0, of magnitude 0.4, and another one at x = 1, of magnitude 0.6. ♦ We make a few comments on jumps: (i) If a distribution function F has exactly one jump (of magnitude 1) at x = c (a real constant), we say F is degenerate at c, or F is the point mass at c. If a X has this F as its distribution function, then P{X = c} = 1.

1 (ii) F can have either finite or countably infinite number of jump discontinuities.

Example 2 A random variable X, defined on some probability space (Ω, A, P), has the , with parameter θ (∈ (0, 1)), if its probability mass func- tion is given by ( θ(1 − θ)x, if x = 0, 1, 2,... P{X = x} = 0, otherwise. The corresponding distribution function is

0, if x < 0   θ, if 0 ≤ x < 1  θ + θ(1 − θ), if 1 ≤ x < 2  θ + θ(1 − θ) + θ(1 − θ)2, if 2 ≤ x < 3 F (x) = {X ≤ x} = X P . . . . n−1  P x n  θ(1 − θ) = 1 − (1 − θ) , if n − 1 ≤ x < n x=0 . . . .

Clearly, FX has countably infinite number of jumps, one at each non-negative integer. ♦

Definition 1 The Lebesgue-Stieltjes measure µF, induced by a continuous distribu- tion function F , is the unique probability measure defined, on the Borel σ-field BR, by µF (a, b] := F (b) − F (a), (L-S) for all bounded left-open right-closed subintervals (a, b] of R.

Some explanation of this definition is warranted:

(iii) The collection of bounded left-open right-closed subintervals of R, P := {(a, b]: −∞ < a < b < ∞}, is a π-system generating BR, and hence, even though µF is defined only for the bounded intervals, it is still completely determined by F through (L-S).

(iv) In defining the Lebesgue-Stieltjes measure for continuous distribution func- tions, we have been restrictive. Indeed, the definition applies to discrete distri- bution functions, as well. In this case, the domain of µF need not be the Borel

2 R σ-field BR; the power class 2 can serve the purpose. For continuous distri- bution functions, however, µF (as defined by (L-S)) cannot measure arbitrary

subsets of R, and the restriction to BR (or something similar) is necessary.

Example 3 Let X be the point mass at c (∈ R). Then the distribution function of X is ( 0, if x < c F (x) = 1, if x ≥ c.

Define the induced measure µF =: δc by ( 1, if S 3 c δc(S) := 0, if S 63 c, for sets S. The measure δc is often called the (Dirac) delta measure concentrated at c. [While not relevant to the point in hand, we state that if S is kept fixed and c is thought to be the argument, then δc(S) = 1S(c) −− the indicator of S.] ♦ Definition 2 A distribution function F is said to be singular if there exists a linear Borel set B of Lebesgue measure 0 such that µF (B) = 1, where µF is the Lebesgue- Stieltjes measure induced by F .

Singular functions form an interesting class of functions, and we discuss a few of their properties: (v) A good look at the definition will reveal that all discrete distributions are singu- lar: the Geometric distribution of Example 2 concentrates all its mass on the set of non-negative integers N∪{0}, and it is well-known that1 Leb(N∪{0}) = 0; similarly, the point mass at c (Example 3), assumes, with probability 1, the value c, and Leb{c} = 0. The real objects of attention are the continuous singular distribution functions, such as the (cf. Distribution Functions I [Example 2]). The following absolutely continuous distribution function 1 1 F (x) = + arctan x, ∀x ∈ , 2 π R is, clearly, not singular. (vi) An equivalent characterization of singularity of functions is that a function is singular if and only if its derivative is zero almost everywhere. This fact can be tested on the functions mentioned in the last remark. 1 Leb is Lebesgue measure on BR.

3 Theorem 1 Every distribution function F can be written in the form

F = π1Fd + π2Fac + π3Fs,

3 P where each πj ≥ 0 and πj = 1, and Fd,Fac,Fs are, respectively, a discrete, an j=1 absolutely continuous, a continuous singular distribution function. Furthermore, such a decomposition is unique.

We do not prove Theorem 1, but instead consider the simpler Theorem 2. For that, we need the following important Lemma.

Lemma 1 The set of discontinuity points of a distribution function F is at most countable.

Proof: 1 Consider the open intervals of the form (`, ` + 1], for ` ∈ Z. Let x1, x2, . . . , xn be any n points of discontinuity of F in (`, ` + 1], such that F has 1 a jump of magnitude exceeding m (for some positive integer m) at each xj. If the xj’s are ordered (i.e., x1 < x2 < ··· < xn), then, by the non-decreasing property of F ,

F (`) ≤ F (x1−0) < F (x1) ≤ F (x2−0) < F (x2) ≤ · · · ≤ F (xn−0) < F (xn) ≤ F (`+1).

Let pk be the jump at xk; then pk = F (xk)−F (xk −0), for k = 1, 2, . . . , n. Therefore,

n n X X  pk = F (xk) − F (xk − 0) ≤ F (` + 1) − F (`). k=1 k=1

1 Since, pk ≥ m , ∀k = 1, 2, . . . , n, we have n ≤ F (` + 1) − F (`) m ⇐⇒ n ≤ m · F (` + 1) − F (`) ≤ m. Even if we consider all points of discontinuity in (`, `+1], and not just a finite number of them, the above inequality remains unchanged. Thus, each interval of unit length has only a finite number of jumps for each m. As m assumes the values 1, 2,... , we see that the set of discontinuity points in (`, ` + 1] is at most countable. Note that [ R = (`, ` + 1]; `∈Z so, the set of all points of discontinuity of F must be at most countable. 

4 Theorem 2 Every distribution function F can be written in the form

F = κ1Fd + κ2Fc, P where each κj ≥ 0 and κj = 1, Fd is as before, and Fc is a continuous (though j=1,2 not necessarily, absolutely continuous) distribution function. Moreover, this decom- position is unique. Proof: 2 By Lemma, the set of discontinuity points of F is at most countable: let, r1, r2,... be the points of discontinuity. Let p(rk) := F (rk) − F (rk − 0), and define X Ed(x) := p(rk), for x ∈ R,

{k:rk≤x} and Ec(x) := F (x) − Ed(x). These definitions imply Ed is a step function and Ec is continuous everywhere on R. Moreover, the functions Ed and Ec satisfy all the properties of a distribution function, except that Ed(∞) = 1 and Ec(∞) = 1. Ed(∞) = 1 iff all the mass of F is concentrated at the jumps, in which case F is discrete and we have F = 1 · Ed + 0. Similarly, Ec(∞) = 1 iff F has no jumps at all, and F is continuous with F = 0 + 1 · Ec. For all other situations, 0 < Ed(∞),Ec(∞) < 1. Define, for all real x,

Ed(x) Ec(x) Fd(x) := and Fc(x) := . Ed(∞) Ec(∞)

def These 2 functions are distribution functions. Note that Ed(x) + Ec(x) = F (x), ∀x ∈ def R, and Ed(∞) + Ec(∞) = F (∞) = 1. Therefore, setting κ1 = Ed(∞) and κ2 = Ec(∞), we have F = κ1Fd + κ2Fc.

Now we prove uniqueness of this decomposition. It suffices to prove that Ed and Ec are uniquely determined, since these, inturn, uniquely determine the corresponding ∗ ∗ distributions. Let Ed + Ec be another decomposition of F . Then,

∗ ∗ ∗ ∗ Ed + Ec = Ed + Ec ⇐⇒ Ec − Ec = Ed − Ed . The difference between two continuous functions is continuous, and the difference ∗ between two step functions is, again, a step function. Hence, the equality Ec − Ec = ∗ ∗ ∗ Ed −Ed presents a contradiction. To avoid this, we must have Ec = Ec and Ed = Ed , and thus, the decomposition is unique. 

5