CHAPTER 1: DISTRIBUTION THEORY Basic Concepts • Random Variable (R.V.) discrete random variable: probability mass function continuous random variable: probability density function
• Random Vector multivariate version of random variables joint probability function, joint density function • Formal definitions based on measure theory R.V. is a measurable function on a probability measure space, which consists of sets in a σ-field equipped with a probability measure; probability mass: the Radon-Nykodym derivative of a random variable-induced measure w.r.t. to a counting measure; density function: the Radon-Nykodym derivative of a random variable-induced measure w.r.t. the Lebesgue measure. • Descriptive quantities of a univariate distribution cumulative distribution function
FX(x) ≡ P(X ≤ x)
moments (mean): E[Xk], k = 1, 2,... (mean if k = 1) ; centralized moments: E[(X − µ)k] (variance if k = 2) quantile function
−1 F (p) ≡ inf {FX(x) > p} X x
the skewness: E[(X − µ)3]/Var(X)3/2 the kurtosis: E[(X − µ)4]/Var(X)2 • Characteristic function (c.f.) definition φX(t) = E[exp{itX}] c.f. uniquely determines the distribution
1 Z T e−ita − e−itb FX(b) − FX(a) = lim φX(t)dt T→∞ 2π −T it Z ∞ 1 −itx fX(x) = e φX(t)dt for continuous r.v. 2π −∞ moment generating function (if it exists)
mX(t) = E[exp{tX}] • Descriptive quantities of two R.V.s (X, Y) covariance and correlation: Cov(X, Y), corr(X, Y) conditional density or probability (f ’s become probability mass functions if R.V.s are discrete):
fX|Y(x|y) = fX,Y(x, y)/fY(y) Z E[X|Y] = xfX|Y(x|y)dx
independence
P(X ≤ x, Y ≤ y) = P(X ≤ x)P(Y ≤ y)
fX,Y(x, y) = fX(x)fY(y) • Double expectation formulae
E[X] = E[E[X|Y]] Var(X) = E[Var(X|Y)] + Var(E[X|Y])
• This is useful for computing the moments of X when X given Y has a simple distribution! 2 In-class exercise Compute E[X] and Var(X) if Y ∼ Bernoulli(p) and X given Y = y follows N(µy, σy ). Discrete Random Variables • Binomial distribution Binomial r.v. is the total number of successes in n independent Bernoulli trials. Binomial(n, p):
n P(S = k) = pk(1 − p)n−k, k = 0, ..., n n k
E[Sn] = np, Var(Sn) = np(1 − p) it n φX(t) = (1 − p + pe ) Summation of two independent Binomial r.v.’s
Binomial(n1, p) + Binomial(n2, p) ∼ Binomial(n1 + n2, p)
In-class exercise Prove the last statement. • Negative Binomial distribution It describes the distribution of the number of independent Bernoulli trials until m successes. NB(m, p):
k − 1 P(W = k) = pm(1 − p)k−m, k = m, m + 1, ... m m − 1
2 E[Wm] = m/p, Var(Wm) = m/p − m/p Summation of two independent r.v.’s:
NB(m1, p) + NB(m2, p) ∼ NB(m1 + m2, p)
In-class exercise Prove the last statement. • Hypergeometric distribution It describes the distribution of the number of black balls if we randomly draw n balls from an urn with M blacks and (N − M) whites (urn model). HG(N,M,n):