<<

Transformations and Expectations of random variables

X ∼ FX (x): a random X distributed with CDF FX . Any Y = g(X) is also a . If both X, and Y are continuous random variables, can we find a simple way to characterize FY and fY (the CDF and PDF of Y ), based on the CDF and PDF of X?  For the CDF:

FY (y) = PY (Y ≤ y)

= PY (g(X) ≤ y)

= PX (x ∈ X : g(X) ≤ y)(X is space for X) Z = fX (s)ds. {x∈X :g(X)≤y}

0 PDF: fY (y) = Fy(y) Caution: need to consider support of y. Consider several examples:

1. X ∼ U[−1, 1] and y = exp(x) That is:  1 if x ∈ [−1, 1] f (x) = 2 X 0 otherwise 1 1 F (x) = + x, for x ∈ [−1, 1]. X 2 2

FY (y) = P rob(exp(X) ≤ y) = P rob(X ≤ log y) 1 1 1 = F (log y) = + log y, for y ∈ [ , e]. X 2 2 e Be careful about the bounds of the support! ∂ f (y) = F (y) Y ∂y Y 1 1 1 = f (log y) = , for y ∈ [ , e]. X y 2y e

1 2. X ∼ U[−1, 1] and Y = X2

2 FY (y) = P rob(X ≤ y) √ √ = P rob(− y ≤ X ≤ y) √ √ = FX ( y) − FX (− y) √ √ √ = 2FX ( y) − 1, by symmetry: FX (− y) = 1 − FX ( y).

∂ f (y) = F (y) Y ∂y Y √ 1 1 = 2f ( y) √ = √ , for y ∈ [0, 1]. X 2 y 2 y

 As the first example above showed, it’s easy to derive the CDF and PDF of Y when g(·) is a strictly monotonic function: Theorems 2.1.3, 2.1.5: When g(·) is a strictly increasing function, then

Z g−1(y) −1 FY (y) = fX (x)dx = FX (g (y)) −∞ ∂ f (y) = f (g−1(y)) g−1(y) using . Y X ∂y

Note: by the theorem,

∂ −1 0 g (y) = 1/ [g (x)] | −1 . ∂y x=g (y)

When g(·) is a strictly decreasing function, then Z ∞ −1 FY (y) = fX (x)dx = 1 − FX (g (y)) g−1(y) ∂ f (y) = −f (g−1(y)) g−1(y) using chain rule. Y X ∂y

These are the change of variables formulas for transformations of univariate random variables. transformations.



2 Here is a special case of a transformation:

Thm 2.1.10: Let X have a continuous CDF FX (·) and define the random variable Y = FX (X). Then Y ∼ U[0, 1], i.e., FY (y) = y, for y ∈ [0, 1].  Expected value (Definition 2.2.1): The expected value, or , of a random variable g(X) is  R ∞ g(x)fX (x)dx if X continuous Eg(X) = P−∞ x∈X g(x)P (X = x) if X discrete provided that the or the sum exists The expectation is a linear operator (just like integration): so that

" n # n X X E α ∗ gi(X) + b = α ∗ Egi(X) + b. i=1 i=1

Note: Expectation is a population , i.e., you average values of the random variable g(X) weighting by the population density fX (x).

A statistical experiment yields sample observations X1,X2,...,Xn ∼ FX . From these sample ¯ 1 P ¯ observations, we can calculate sample avg. Xn ≡ n i Xi. In general: Xn 6= EX. But under ¯ some conditions, as n → ∞, then Xn → EX in some sense (which we discuss later).  Expected value is commonly used of “” of a random variable X.

1 Example: But mean may not exist: Cauchy random variable with density f(x) = π(1+x2) for x ∈ (−∞, ∞). Note that Z ∞ x Z 0 x Z ∞ x 2 dx = 2 dx + 2 dx −∞ π(1 + x ) −∞ π(1 + x ) 0 π(1 + x ) 0 Z 0 x Z b x = lim 2 dx + lim 2 dx a→−∞ a π(1 + x ) b→∞ 0 π(1 + x ) 0 1 2 0 1 2 b = lim [log(1 + x )]a + lim [log(1 + x )]0 a→−∞ 2π b→∞ 2π = −∞ + ∞ undefined

 Other measures:

3 1. Median: med(X) = m such that FX (x) = 0.5. Robust to outliers, and has nice invariance property: for Y = g(X) and g(·) monotonic increasing, then med(Y ) = g(med(X)).

2. : Mode(X) = maxx fX (x).

 Moments: important class of expectations

0 n For each integer n, the n-th (uncentred) of X ∼ FX (·) is µn ≡ EX . n n The n-th centred moment is µn ≡ E(X − µ) = E(X − EX) . (It is centred around the mean EX.)

2 √ For n = 2: µ2 = E(X − EX) is the of X. µ2 is the . Important formulas:

• V ar(aX + b) = a2V arX (variance is not a linear operation)

• V arX = E(X2) − (EX)2: alternative formula for the variance

 The moments of a random variable are summarized in the moment generating function.

Definition: the moment-generating function of X is MX (t) ≡ E exp(tX), provided that the expectation exists in some neighborhood t ∈ [−h, h] of zero. Specifically:

 R ∞ tx −∞ e fX (x)dx for X continuous Mx(t) = P tx x∈X e P (X = x) for X discrete.

The uncentred moments of X are generated from this function by:

n n (n) d EX = MX (0) ≡ n MX (t) , dt t=0 which is the n-th of the MGF, evaluated at t = 0. When it exists (see below), then MGF provides alternative description of a distribution. Mathematically, it is a , which can be convenient for certain mathematical calculations.

4 Example: standard : Z ∞ 1  x2  MX (t) = √ exp tx − dx −∞ 2π 2 Z ∞ 1  1  = √ exp − ((x − t)2 − t2) dx −∞ 2π 2 1 Z ∞ 1  1  = exp( t2) · √ exp − (x − t)2 dx 2 −∞ 2π 2 1 = exp( t2) · 1 2 where last term on RHS is integral over density function of N(t, 1), which integrates to one.

1 1 2 First moment: EX = MX (0) = t · exp( 2 t ) t=0 = 0. 2 2 1 2 2 1 2 Second moment: EX = MX (0) = exp( 2 t ) + t exp( 2 t ) = 1.  In many cases, the MGF can characterize a distribution. But problem is that it may not exist (eg. ) For a RV X, is its distribution uniquely determined by its moment generating function?

Thm 2.3.11: For X ∼ FX and Y ∼ FY , if MX and MY exist, and MX (t) = MY (t) for all t in some neighborhood of zero, then FX (u) = FY (u) for all u. Note that if the MGF exists, then it characterizes a random variable with an infinite number of moments (because the MGF is infinitely differentiable). Converse not necessarily true. (ex. log-normal random variable: X ∼ N(0, 1), Y = exp(X))

 : The characteristic function of a random variable g(x), defined as Z +∞ φg(x)(t) = Ex exp(itg(x)) = exp(itg(x))f(x)dx −∞ where f(x) is the density for x. This is also called the “Fourier transform”. Features of characteristic function:

• The CF always exists. This follows from the equality eitx = cos(tx) + i · sin(tx), and both the real and complex parts of the integrand are bounded functions.

5 • Consider a symmetric density function, with f(−x) = f(x) (symmetric around zero). Then resulting φ(t) is real-valued, and symmetric around zero. • The CF completely determines the distribution of X (every cdf has a unique charac- teristic function).

• Let X have characteristic function φX (t). Then Y = aX +b has characteristic function ibt φY (t) = e φX (at).

• X and Y , independent, with characteristic functions φX (t) and φY (t). Then φX+Y (t) = φX (t)φY (t) • φ(0) = 1.

R +∞ 1 • For a given characteristic function φX (t) such that −∞ |φX (t)|dt < ∞, the corre- sponding density fX (x) is given by the inverse Fourier transform, which is 1 Z +∞ fX (x) = φX (t) exp(−itx)dt. 2π −∞

Example: N(0, 1) distribution, with density f(x) = √1 exp(−x2/2). 2π Take as given that the characteristic function of N(0, 1) is Z 1 2  2 φN(0,1)(t) = √ exp itx − x /2) dx = exp(−t /2). (1) 2π Hence the inversion formula yields 1 Z +∞ f(x) = exp(−t2/2) exp(−itx)dt. 2π −∞ Now making substitution z = −t, we get 1 Z +∞ exp izx − z2/2dz 2π −∞ 1 1 2 =√ φN(0,1)(x) = √ exp(x /2) = fN(0,1)(x). (Use Eq. (1)) 2π 2π

• Characteristic function also summarizes the moments of a random variable. Specifi- cally, note that the h-th derivative of φ(t) is Z +∞ φh(t) = ihg(x)h exp(itg(x))f(x)dx. (2) −∞

1Here | · | denotes the modulus of a complex number. For x + iy, we have |x + iy| = px2 + y2.

6 h h Hence, assuming the h-th moment, denoted µg(x) ≡ E[g(x)] exists, it is equal to

h h h µg(x) = φ (0)/i .

Hence, assuming that the required moments exist, we can use Taylor’s theorem to expand the characteristic function around t = 0 to get:

it (it)2 (it)k φ(t) = 1 + µ1 + µ2 + ... + µk + o(tk). 1 g(x) 2 g(x) k! g(x)

• Cauchy distribution, cont’d: The characteristic function for the Cauchy distribu- tion is φ(t) = exp(−|t|). This is not differentiable at t = 0, which by Eq. (2) is saying that its mean does not exist. Hence, the expansion of the characteristic function in this case is invalid.

7