Transformations and Expectations of random variables
X ∼ FX (x): a random variable X distributed with CDF FX . Any function Y = g(X) is also a random variable. If both X, and Y are continuous random variables, can we find a simple way to characterize FY and fY (the CDF and PDF of Y ), based on the CDF and PDF of X? For the CDF:
FY (y) = PY (Y ≤ y)
= PY (g(X) ≤ y)
= PX (x ∈ X : g(X) ≤ y)(X is sample space for X) Z = fX (s)ds. {x∈X :g(X)≤y}
0 PDF: fY (y) = Fy(y) Caution: need to consider support of y. Consider several examples:
1. X ∼ U[−1, 1] and y = exp(x) That is: 1 if x ∈ [−1, 1] f (x) = 2 X 0 otherwise 1 1 F (x) = + x, for x ∈ [−1, 1]. X 2 2
FY (y) = P rob(exp(X) ≤ y) = P rob(X ≤ log y) 1 1 1 = F (log y) = + log y, for y ∈ [ , e]. X 2 2 e Be careful about the bounds of the support! ∂ f (y) = F (y) Y ∂y Y 1 1 1 = f (log y) = , for y ∈ [ , e]. X y 2y e
1 2. X ∼ U[−1, 1] and Y = X2
2 FY (y) = P rob(X ≤ y) √ √ = P rob(− y ≤ X ≤ y) √ √ = FX ( y) − FX (− y) √ √ √ = 2FX ( y) − 1, by symmetry: FX (− y) = 1 − FX ( y).
∂ f (y) = F (y) Y ∂y Y √ 1 1 = 2f ( y) √ = √ , for y ∈ [0, 1]. X 2 y 2 y
As the first example above showed, it’s easy to derive the CDF and PDF of Y when g(·) is a strictly monotonic function: Theorems 2.1.3, 2.1.5: When g(·) is a strictly increasing function, then
Z g−1(y) −1 FY (y) = fX (x)dx = FX (g (y)) −∞ ∂ f (y) = f (g−1(y)) g−1(y) using chain rule. Y X ∂y
Note: by the inverse function theorem,
∂ −1 0 g (y) = 1/ [g (x)] | −1 . ∂y x=g (y)
When g(·) is a strictly decreasing function, then Z ∞ −1 FY (y) = fX (x)dx = 1 − FX (g (y)) g−1(y) ∂ f (y) = −f (g−1(y)) g−1(y) using chain rule. Y X ∂y
These are the change of variables formulas for transformations of univariate random variables. transformations.
2 Here is a special case of a transformation:
Thm 2.1.10: Let X have a continuous CDF FX (·) and define the random variable Y = FX (X). Then Y ∼ U[0, 1], i.e., FY (y) = y, for y ∈ [0, 1]. Expected value (Definition 2.2.1): The expected value, or mean, of a random variable g(X) is R ∞ g(x)fX (x)dx if X continuous Eg(X) = P−∞ x∈X g(x)P (X = x) if X discrete provided that the integral or the sum exists The expectation is a linear operator (just like integration): so that
" n # n X X E α ∗ gi(X) + b = α ∗ Egi(X) + b. i=1 i=1
Note: Expectation is a population average, i.e., you average values of the random variable g(X) weighting by the population density fX (x).
A statistical experiment yields sample observations X1,X2,...,Xn ∼ FX . From these sample ¯ 1 P ¯ observations, we can calculate sample avg. Xn ≡ n i Xi. In general: Xn 6= EX. But under ¯ some conditions, as n → ∞, then Xn → EX in some sense (which we discuss later). Expected value is commonly used measure of “central tendency” of a random variable X.
1 Example: But mean may not exist: Cauchy random variable with density f(x) = π(1+x2) for x ∈ (−∞, ∞). Note that Z ∞ x Z 0 x Z ∞ x 2 dx = 2 dx + 2 dx −∞ π(1 + x ) −∞ π(1 + x ) 0 π(1 + x ) 0 Z 0 x Z b x = lim 2 dx + lim 2 dx a→−∞ a π(1 + x ) b→∞ 0 π(1 + x ) 0 1 2 0 1 2 b = lim [log(1 + x )]a + lim [log(1 + x )]0 a→−∞ 2π b→∞ 2π = −∞ + ∞ undefined
Other measures:
3 1. Median: med(X) = m such that FX (x) = 0.5. Robust to outliers, and has nice invariance property: for Y = g(X) and g(·) monotonic increasing, then med(Y ) = g(med(X)).
2. Mode: Mode(X) = maxx fX (x).
Moments: important class of expectations
0 n For each integer n, the n-th (uncentred) moment of X ∼ FX (·) is µn ≡ EX . n n The n-th centred moment is µn ≡ E(X − µ) = E(X − EX) . (It is centred around the mean EX.)
2 √ For n = 2: µ2 = E(X − EX) is the Variance of X. µ2 is the standard deviation. Important formulas:
• V ar(aX + b) = a2V arX (variance is not a linear operation)
• V arX = E(X2) − (EX)2: alternative formula for the variance
The moments of a random variable are summarized in the moment generating function.
Definition: the moment-generating function of X is MX (t) ≡ E exp(tX), provided that the expectation exists in some neighborhood t ∈ [−h, h] of zero. Specifically:
R ∞ tx −∞ e fX (x)dx for X continuous Mx(t) = P tx x∈X e P (X = x) for X discrete.
The uncentred moments of X are generated from this function by:
n n (n) d EX = MX (0) ≡ n MX (t) , dt t=0 which is the n-th derivative of the MGF, evaluated at t = 0. When it exists (see below), then MGF provides alternative description of a probability distribution. Mathematically, it is a Laplace transform, which can be convenient for certain mathematical calculations.
4 Example: standard normal distribution: Z ∞ 1 x2 MX (t) = √ exp tx − dx −∞ 2π 2 Z ∞ 1 1 = √ exp − ((x − t)2 − t2) dx −∞ 2π 2 1 Z ∞ 1 1 = exp( t2) · √ exp − (x − t)2 dx 2 −∞ 2π 2 1 = exp( t2) · 1 2 where last term on RHS is integral over density function of N(t, 1), which integrates to one.
1 1 2 First moment: EX = MX (0) = t · exp( 2 t ) t=0 = 0. 2 2 1 2 2 1 2 Second moment: EX = MX (0) = exp( 2 t ) + t exp( 2 t ) = 1. In many cases, the MGF can characterize a distribution. But problem is that it may not exist (eg. Cauchy distribution) For a RV X, is its distribution uniquely determined by its moment generating function?
Thm 2.3.11: For X ∼ FX and Y ∼ FY , if MX and MY exist, and MX (t) = MY (t) for all t in some neighborhood of zero, then FX (u) = FY (u) for all u. Note that if the MGF exists, then it characterizes a random variable with an infinite number of moments (because the MGF is infinitely differentiable). Converse not necessarily true. (ex. log-normal random variable: X ∼ N(0, 1), Y = exp(X))
Characteristic function: The characteristic function of a random variable g(x), defined as Z +∞ φg(x)(t) = Ex exp(itg(x)) = exp(itg(x))f(x)dx −∞ where f(x) is the density for x. This is also called the “Fourier transform”. Features of characteristic function:
• The CF always exists. This follows from the equality eitx = cos(tx) + i · sin(tx), and both the real and complex parts of the integrand are bounded functions.
5 • Consider a symmetric density function, with f(−x) = f(x) (symmetric around zero). Then resulting φ(t) is real-valued, and symmetric around zero. • The CF completely determines the distribution of X (every cdf has a unique charac- teristic function).
• Let X have characteristic function φX (t). Then Y = aX +b has characteristic function ibt φY (t) = e φX (at).
• X and Y , independent, with characteristic functions φX (t) and φY (t). Then φX+Y (t) = φX (t)φY (t) • φ(0) = 1.
R +∞ 1 • For a given characteristic function φX (t) such that −∞ |φX (t)|dt < ∞, the corre- sponding density fX (x) is given by the inverse Fourier transform, which is 1 Z +∞ fX (x) = φX (t) exp(−itx)dt. 2π −∞
Example: N(0, 1) distribution, with density f(x) = √1 exp(−x2/2). 2π Take as given that the characteristic function of N(0, 1) is Z 1 2 2 φN(0,1)(t) = √ exp itx − x /2) dx = exp(−t /2). (1) 2π Hence the inversion formula yields 1 Z +∞ f(x) = exp(−t2/2) exp(−itx)dt. 2π −∞ Now making substitution z = −t, we get 1 Z +∞ exp izx − z2/2dz 2π −∞ 1 1 2 =√ φN(0,1)(x) = √ exp(x /2) = fN(0,1)(x). (Use Eq. (1)) 2π 2π
• Characteristic function also summarizes the moments of a random variable. Specifi- cally, note that the h-th derivative of φ(t) is Z +∞ φh(t) = ihg(x)h exp(itg(x))f(x)dx. (2) −∞
1Here | · | denotes the modulus of a complex number. For x + iy, we have |x + iy| = px2 + y2.
6 h h Hence, assuming the h-th moment, denoted µg(x) ≡ E[g(x)] exists, it is equal to
h h h µg(x) = φ (0)/i .
Hence, assuming that the required moments exist, we can use Taylor’s theorem to expand the characteristic function around t = 0 to get:
it (it)2 (it)k φ(t) = 1 + µ1 + µ2 + ... + µk + o(tk). 1 g(x) 2 g(x) k! g(x)
• Cauchy distribution, cont’d: The characteristic function for the Cauchy distribu- tion is φ(t) = exp(−|t|). This is not differentiable at t = 0, which by Eq. (2) is saying that its mean does not exist. Hence, the expansion of the characteristic function in this case is invalid.
7