Transformations and Expectations of Random Variables

Transformations and Expectations of random variables X ∼ FX (x): a random variable X distributed with CDF FX . Any function Y = g(X) is also a random variable. If both X, and Y are continuous random variables, can we find a simple way to characterize FY and fY (the CDF and PDF of Y ), based on the CDF and PDF of X? For the CDF: FY (y) = PY (Y ≤ y) = PY (g(X) ≤ y) = PX (x 2 X : g(X) ≤ y)(X is sample space for X) Z = fX (s)ds: fx2X :g(X)≤yg 0 PDF: fY (y) = Fy(y) Caution: need to consider support of y. Consider several examples: 1. X ∼ U[−1; 1] and y = exp(x) That is: 1 if x 2 [−1; 1] f (x) = 2 X 0 otherwise 1 1 F (x) = + x; for x 2 [−1; 1]. X 2 2 FY (y) = P rob(exp(X) ≤ y) = P rob(X ≤ log y) 1 1 1 = F (log y) = + log y; for y 2 [ ; e]: X 2 2 e Be careful about the bounds of the support! @ f (y) = F (y) Y @y Y 1 1 1 = f (log y) = ; for y 2 [ ; e]. X y 2y e 1 2. X ∼ U[−1; 1] and Y = X2 2 FY (y) = P rob(X ≤ y) p p = P rob(− y ≤ X ≤ y) p p = FX ( y) − FX (− y) p p p = 2FX ( y) − 1; by symmetry: FX (− y) = 1 − FX ( y). @ f (y) = F (y) Y @y Y p 1 1 = 2f ( y) p = p ; for y 2 [0; 1]. X 2 y 2 y As the first example above showed, it's easy to derive the CDF and PDF of Y when g(·) is a strictly monotonic function: Theorems 2.1.3, 2.1.5: When g(·) is a strictly increasing function, then Z g−1(y) −1 FY (y) = fX (x)dx = FX (g (y)) −∞ @ f (y) = f (g−1(y)) g−1(y) using chain rule: Y X @y Note: by the inverse function theorem, @ −1 0 g (y) = 1= [g (x)] j −1 : @y x=g (y) When g(·) is a strictly decreasing function, then Z 1 −1 FY (y) = fX (x)dx = 1 − FX (g (y)) g−1(y) @ f (y) = −f (g−1(y)) g−1(y) using chain rule: Y X @y These are the change of variables formulas for transformations of univariate random variables. transformations. 2 Here is a special case of a transformation: Thm 2.1.10: Let X have a continuous CDF FX (·) and define the random variable Y = FX (X). Then Y ∼ U[0; 1], i.e., FY (y) = y, for y 2 [0; 1]. Expected value (Definition 2.2.1): The expected value, or mean, of a random variable g(X) is R 1 g(x)fX (x)dx if X continuous Eg(X) = P−∞ x2X g(x)P (X = x) if X discrete provided that the integral or the sum exists The expectation is a linear operator (just like integration): so that " n # n X X E α ∗ gi(X) + b = α ∗ Egi(X) + b: i=1 i=1 Note: Expectation is a population average, i.e., you average values of the random variable g(X) weighting by the population density fX (x). A statistical experiment yields sample observations X1;X2;:::;Xn ∼ FX . From these sample ¯ 1 P ¯ observations, we can calculate sample avg. Xn ≡ n i Xi. In general: Xn 6= EX. But under ¯ some conditions, as n ! 1, then Xn ! EX in some sense (which we discuss later). Expected value is commonly used measure of \central tendency" of a random variable X. 1 Example: But mean may not exist: Cauchy random variable with density f(x) = π(1+x2) for x 2 (−∞; 1). Note that Z 1 x Z 0 x Z 1 x 2 dx = 2 dx + 2 dx −∞ π(1 + x ) −∞ π(1 + x ) 0 π(1 + x ) 0 Z 0 x Z b x = lim 2 dx + lim 2 dx a→−∞ a π(1 + x ) b!1 0 π(1 + x ) 0 1 2 0 1 2 b = lim [log(1 + x )]a + lim [log(1 + x )]0 a→−∞ 2π b!1 2π = −∞ + 1 undefined Other measures: 3 1. Median: med(X) = m such that FX (x) = 0:5. Robust to outliers, and has nice invariance property: for Y = g(X) and g(·) monotonic increasing, then med(Y ) = g(med(X)). 2. Mode: Mode(X) = maxx fX (x). Moments: important class of expectations 0 n For each integer n, the n-th (uncentred) moment of X ∼ FX (·) is µn ≡ EX . n n The n-th centred moment is µn ≡ E(X − µ) = E(X − EX) . (It is centred around the mean EX.) 2 p For n = 2: µ2 = E(X − EX) is the Variance of X. µ2 is the standard deviation. Important formulas: • V ar(aX + b) = a2V arX (variance is not a linear operation) • V arX = E(X2) − (EX)2: alternative formula for the variance The moments of a random variable are summarized in the moment generating function. Definition: the moment-generating function of X is MX (t) ≡ E exp(tX), provided that the expectation exists in some neighborhood t 2 [−h; h] of zero. Specifically: R 1 tx −∞ e fX (x)dx for X continuous Mx(t) = P tx x2X e P (X = x) for X discrete: The uncentred moments of X are generated from this function by: n n (n) d EX = MX (0) ≡ n MX (t) ; dt t=0 which is the n-th derivative of the MGF, evaluated at t = 0. When it exists (see below), then MGF provides alternative description of a probability distribution. Mathematically, it is a Laplace transform, which can be convenient for certain mathematical calculations. 4 Example: standard normal distribution: Z 1 1 x2 MX (t) = p exp tx − dx −∞ 2π 2 Z 1 1 1 = p exp − ((x − t)2 − t2) dx −∞ 2π 2 1 Z 1 1 1 = exp( t2) · p exp − (x − t)2 dx 2 −∞ 2π 2 1 = exp( t2) · 1 2 where last term on RHS is integral over density function of N(t; 1), which integrates to one. 1 1 2 First moment: EX = MX (0) = t · exp( 2 t ) t=0 = 0. 2 2 1 2 2 1 2 Second moment: EX = MX (0) = exp( 2 t ) + t exp( 2 t ) = 1. In many cases, the MGF can characterize a distribution. But problem is that it may not exist (eg. Cauchy distribution) For a RV X, is its distribution uniquely determined by its moment generating function? Thm 2.3.11: For X ∼ FX and Y ∼ FY , if MX and MY exist, and MX (t) = MY (t) for all t in some neighborhood of zero, then FX (u) = FY (u) for all u. Note that if the MGF exists, then it characterizes a random variable with an infinite number of moments (because the MGF is infinitely differentiable). Converse not necessarily true. (ex. log-normal random variable: X ∼ N(0; 1), Y = exp(X)) Characteristic function: The characteristic function of a random variable g(x), defined as Z +1 φg(x)(t) = Ex exp(itg(x)) = exp(itg(x))f(x)dx −∞ where f(x) is the density for x. This is also called the \Fourier transform". Features of characteristic function: • The CF always exists. This follows from the equality eitx = cos(tx) + i · sin(tx), and both the real and complex parts of the integrand are bounded functions. 5 • Consider a symmetric density function, with f(−x) = f(x) (symmetric around zero). Then resulting φ(t) is real-valued, and symmetric around zero. • The CF completely determines the distribution of X (every cdf has a unique characteristic function). • Let X have characteristic function φX (t). Then Y = aX +b has characteristic function ibt φY (t) = e φX (at). • X and Y , independent, with characteristic functions φX (t) and φY (t). Then φX+Y (t) = φX (t)φY (t) • φ(0) = 1. R +1 1 • For a given characteristic function φX (t) such that −∞ jφX (t)jdt < 1, the corre- sponding density fX (x) is given by the inverse Fourier transform, which is 1 Z +1 fX (x) = φX (t) exp(−itx)dt: 2π −∞ Example: N(0; 1) distribution, with density f(x) = p1 exp(−x2=2). 2π Take as given that the characteristic function of N(0; 1) is Z 1 2 2 φN(0;1)(t) = p exp itx − x =2) dx = exp(−t =2): (1) 2π Hence the inversion formula yields 1 Z +1 f(x) = exp(−t2=2) exp(−itx)dt: 2π −∞ Now making substitution z = −t, we get 1 Z +1 exp izx − z2=2dz 2π −∞ 1 1 2 =p φN(0;1)(x) = p exp(x =2) = fN(0;1)(x): (Use Eq. (1)) 2π 2π • Characteristic function also summarizes the moments of a random variable. Specifi- cally, note that the h-th derivative of φ(t) is Z +1 φh(t) = ihg(x)h exp(itg(x))f(x)dx: (2) −∞ 1Here j · j denotes the modulus of a complex number. For x + iy, we have jx + iyj = px2 + y2. 6 h h Hence, assuming the h-th moment, denoted µg(x) ≡ E[g(x)] exists, it is equal to h h h µg(x) = φ (0)=i : Hence, assuming that the required moments exist, we can use Taylor's theorem to expand the characteristic function around t = 0 to get: it (it)2 (it)k φ(t) = 1 + µ1 + µ2 + ::: + µk + o(tk): 1 g(x) 2 g(x) k! g(x) • Cauchy distribution, cont'd: The characteristic function for the Cauchy distribution is φ(t) = exp(−|tj): This is not differentiable at t = 0, which by Eq.

Load more