Statistics for Data Science
Total Page:16
File Type:pdf, Size:1020Kb
Statistics for Data Science MSc Data Science WiSe 2019/20 Prof. Dr. Dirk Ostwald 1 (2) Random variables 2 Random variables • Definition and notation • Cumulative distribution functions • Probability mass and density functions 3 Random variables • Definition and notation • Cumulative distribution functions • Probability mass and density functions 4 Definition and notation Random variables and distributions • Let (Ω; A; P) be a probability space and let X :Ω !X be a function. • Let S be a σ-algebra on X . • For every S 2 S let the preimage of S be X−1(S) := f! 2 ΩjX(!) 2 Sg: (1) • If X−1(S) 2 A for all S 2 S, then X is called measurable. • Let X :Ω !X be measurable. All S 2 S get allocated the probability −1 PX : S! [0; 1];S 7! PX (S) := P X (S) = P (f! 2 ΩjX(!) 2 Sg) (2) • X is called a random variable and PX is called the distribution of X. • (X ; S; PX ) is a probability space. • With X = R and S = B the probability space (R; B; PX ) takes center stage. 5 Definition and notation Random variables and distributions 6 Definition and notation Definition (Random variable) Let (Ω; A; P) denote a probability space. A (real-valued) random variable is a mapping X :Ω ! R;! 7! X(!); (3) with the measurability property f! 2 ΩjX(!) 2 S)g 2 A for all S 2 S: (4) Remarks • Random variables are neither \random" nor \variables". • Intuitively, ! 2 Ω gets randomly selected according to P and X(!) realized. • The distributions (probability measures) of random variables are central. 7 Definition and notation Random variables and distributions • Let (Ω; A; P) and (X ; S; PX ) denote probability spaces for X :Ω !X . • The following notations for events A 2 A w.r.t. X are conventional: fX 2 Sg := f! 2 ΩjX(!) 2 Sg;S ⊂ X fX = xg := f! 2 ΩjX(!) = xg; x 2 X fX ≤ xg := f! 2 ΩjX(!) ≤ xg; x 2 X fX < xg := f! 2 ΩjX(!) < xg; x 2 X • These conventions entail the following conventions for distributions: PX (X 2 S) = P (fX 2 Sg) = P (f! 2 ΩjX(!) 2 Sg) ;S ⊂ X PX (X ≤ x) = P (fX ≤ xg) = P (f! 2 ΩjX(!) ≤ xg) ; x 2 X • Often, the random variable subscript in distribution symbols is omitted: P (X 2 S) = PX (X 2 S) ;S ⊂ X P (X ≤ x) = PX (X ≤ S) ; x 2 X • Distributions can be defined using cumulative distribution functions, probability mass functions, and probability density functions. 8 Random variables • Definition and notation • Cumulative distribution functions • Probability mass and density functions 9 Cumulative distribution functions Definition (Cumulative distribution function) The cumulative distribution function (CDF) of a random variable X is defined as P : R ! [0; 1]; x 7! P (x) := P(X ≤ x): (5) Remarks • CDFs can be used to define distributions. • CDFs exist for both discrete and continuous random variables. 10 Cumulative distribution functions Example (Cumulative distribution function) Consider a random variable with outcome space X = f0; 1; 2g and distribution defined by 1 1 1 (X = 0) = ; (X = 1) = ; (X = 2) = (6) P 4 P 2 P 4 Then its distribution function is given by 8 0 x < 0; > > <> 1 0 ≤ x < 1; P : ! [0; 1]; x 7! P (x) := 4 (7) R 3 > 4 1 ≤ x < 1; > :>1 x ≥ 2: Remarks • P is right-continuous. • P is defined for all x 2 R, while X 2 f0; 1; 2g. 11 Cumulative distribution functions Identity of CDFs Let X have CDF P and let Y have CDF Q. If P (x) = Q(x) for all x, then P(X 2 S) = P(Y 2 S) for all events S 2 S. Properties of CDFs A function P : R ! [0; 1] is a CDF for some probability P, if and only if P satisfies the following conditions (1) P is non-decreasing: x1 < x2 implies that P (x1) ≤ P (x2). (2) P is normalized: limx→−∞ P (x) = 0 and limx!1 P (x) = 1. + + (3) P is right-continuous: P (x) = P (x ) for all x, where P (x ) := limy!x;y>x P (y). 12 Random variables • Definition and notation • Cumulative distribution functions • Probability mass and density functions 13 Probability mass and density functions Definition (Probability mass functions, discrete random variables) A random variable X is discrete, if it takes on countably many values in X := fx1; x2; :::g. The probability mass function of X is defined as p : X! [0; 1]; x 7! p(x) := P(X = x): (8) Remarks • A set is countable, if it is finite or bijectively related to N. • A PMF is non-negative: p(x) ≥ 0 for all x 2 X . • P A PMF is normalized: i p(x) = 1. • The CDF of a PMF is P (x) = (X ≤ x) = P p(x ). P xi≤x i • The CDF of a PMF is also referred to as a cumulative mass function (CMF). 14 Probability mass and density functions Example (Bernoulli random variable) Let X be a random variable with outcome set X = f0; 1g and probability mass function p : X! [0; 1]; x 7! p(x) := µx(1 − µ)1−x for µ 2 [0; 1]: (9) Then X is said to be distributed according to a Bernoulli distribution with parameter µ 2 [0; 1], for which we write X ∼ Bern(µ). We denote the probability mass function of a Bernoulli random variable by Bern(x; µ) := µx(1 − µ)1−x: (10) Remarks • A Bernoulli random variable can be used to model a single biased coin flip with outcomes \failure" 0 and \success" 1. • µ is the probability for X to take the value 1, 1 1−1 P(X = 1) = µ (1 − µ) = µ. (11) 15 Probability mass and density functions Definition (Probability density functions, continuous random variables) A random variable X is continuous, if there exists a function p : R ! R≥0; x 7! p(x) (12) such that • p(x) ≥ 0 for all x 2 R, • R 1 −∞ p(x)dx = 1, • R b P(a ≤ X ≤ b) = a p(x) dx for all a; b 2 R; a ≤ b. Remarks • R a PDFs can take on values larger than 1 and P(X = a) = a p(x) dx = 0. • Probabilities are obtained from PDFs by integration, • (Probability) mass = (probability) density × (set) volume. • R x d The CDF of a PDF is P (x) = −∞ p(ξ) dξ, thus p(x) = dx P (x): • The CDF of a PDF is also referred to as cumulative density function. 16 Probability mass and density functions Example (Gaussian random variable, standard normal variable) Let X be a random variable with outcome set R and probability density function 1 1 2 p : R ! R>0; x 7! p(x) := p exp − (x − µ) : (13) 2πσ2 2σ2 Then X is said to be distributed according to a Gaussian distribution with parameters 2 2 µ 2 R and σ > 0, for which we write X ∼ N µ, σ . We abbreviate the PDF of a Gaussian random variable by 1 1 N x; µ, σ2 := p exp − (x − µ)2 : (14) 2πσ2 2σ2 A Gaussian random variable with µ = 0 and σ2 = 1 is said to be distributed according to a standard normal distribution and is often referred to as a Z variable. Remarks • The parameter µ specifies the location of highest probability density. • The parameter σ2 specifies the width of the distribution. • The term p 1 is the normalization constant for exp − 1 (x − µ)2 . 2πσ2 2σ2 17 Probability mass and density functions Example (Uniform random variables) Let X be a discrete random variable with a finite outcome set X and probability mass function 1 p : X! R≥0; x 7! p(x) := : (15) jX j Then X is said to be distributed according to a discrete uniform distribution, for which we write X ∼ U(jX j). We abbreviate the PMF of a discrete uniform random variable by 1 U(x; jX j) := : (16) jX j Similarly, let X be a continuous random variable with probability density function ( 1 b−a x 2 [a; b] p : R ! R>0; x 7! p(x) := (17) 0 x2 = [a; b] Then X is said to be distributed according to a continuous uniform distribution with parameters a and b, for which we write X ∼ U(a; b). We abbreviate the PDF of a continuous uniform random variable by 1 U(x; a; b) := : (18) b − a 18 Probability mass and density functions Properties of cumulative density functions • P(X > x) = 1 − P (x) (Exceedance distribution function) • P(x < X ≤ y) = P (y) − P (x) (Interval probability) • With the properties of the Riemann integral, we have P (y) − P (x) = P(x < X < y) = P(x ≤ X < y) (19) = P(x < X ≤ y) = P(x ≤ X ≤ y): 19 Probability mass and density functions Definition (Inverse cumulative distribution function) Let X be a random variable with CDF P . Then the inverse cumulative distri- bution function or quantile function of X is defined as −1 −1 P : [0; 1] ! R; q 7! P (q) := inffxjP (x) > qg (20) If P is invertible, i.e., strictly increasing and continuous, then P −1(q) is the unique real number x such that P (x) = q. Remarks • P −1(0:25) is called the first quartile. • P −1(0:50) is called the median or second quartile. • P −1(q) is also referred to as qth percentile. 20 Probability mass and density functions Example (CDF and inverse CDF for Gaussian random variables) Let X be a univariate Gaussian random variable with expectation parameter µ and variance parameter σ2.