<<

Statistics for Data Science

MSc Data Science WiSe 2019/20

Prof. Dr. Dirk Ostwald

1 (2) Random variables

2 Random variables • Definition and notation • Cumulative distribution functions • mass and density functions

3 Random variables • Definition and notation • Cumulative distribution functions • Probability mass and density functions

4 Definition and notation

Random variables and distributions

• Let (Ω, A, P) be a probability space and let X :Ω → X be a function. • Let S be a σ-algebra on X . • For every S ∈ S let the preimage of S be

X−1(S) := {ω ∈ Ω|X(ω) ∈ S}. (1)

• If X−1(S) ∈ A for all S ∈ S, then X is called measurable. • Let X :Ω → X be measurable. All S ∈ S get allocated the probability

−1  PX : S → [0, 1],S 7→ PX (S) := P X (S) = P ({ω ∈ Ω|X(ω) ∈ S}) (2)

• X is called a and PX is called the distribution of X.

• (X , S, PX ) is a probability space.

• With X = R and S = B the probability space (R, B, PX ) takes center stage.

5 Definition and notation

Random variables and distributions

6 Definition and notation

Definition (Random variable)

Let (Ω, A, P) denote a probability space. A (real-valued) random variable is a mapping

X :Ω → R, ω 7→ X(ω), (3) with the measurability property

{ω ∈ Ω|X(ω) ∈ S)} ∈ A for all S ∈ S. (4)

Remarks • Random variables are neither “random” nor “variables”.

• Intuitively, ω ∈ Ω gets randomly selected according to P and X(ω) realized. • The distributions (probability measures) of random variables are central.

7 Definition and notation Random variables and distributions

• Let (Ω, A, P) and (X , S, PX ) denote probability spaces for X :Ω → X . • The following notations for events A ∈ A w.r.t. X are conventional:

{X ∈ S} := {ω ∈ Ω|X(ω) ∈ S},S ⊂ X {X = x} := {ω ∈ Ω|X(ω) = x}, x ∈ X {X ≤ x} := {ω ∈ Ω|X(ω) ≤ x}, x ∈ X {X < x} := {ω ∈ Ω|X(ω) < x}, x ∈ X • These conventions entail the following conventions for distributions:

PX (X ∈ S) = P ({X ∈ S}) = P ({ω ∈ Ω|X(ω) ∈ S}) ,S ⊂ X

PX (X ≤ x) = P ({X ≤ x}) = P ({ω ∈ Ω|X(ω) ≤ x}) , x ∈ X • Often, the random variable subscript in distribution symbols is omitted:

P (X ∈ S) = PX (X ∈ S) ,S ⊂ X

P (X ≤ x) = PX (X ≤ S) , x ∈ X • Distributions can be defined using cumulative distribution functions, probability mass functions, and probability density functions.

8 Random variables • Definition and notation • Cumulative distribution functions • Probability mass and density functions

9 Cumulative distribution functions

Definition (Cumulative distribution function)

The cumulative distribution function (CDF) of a random variable X is defined as P : R → [0, 1], x 7→ P (x) := P(X ≤ x). (5)

Remarks • CDFs can be used to define distributions. • CDFs exist for both discrete and continuous random variables.

10 Cumulative distribution functions

Example (Cumulative distribution function)

Consider a random variable with outcome space X = {0, 1, 2} and distribution defined by 1 1 1 (X = 0) = , (X = 1) = , (X = 2) = (6) P 4 P 2 P 4 Then its distribution function is given by  0 x < 0,    1 0 ≤ x < 1, P : → [0, 1], x 7→ P (x) := 4 (7) R 3  4 1 ≤ x < 1,  1 x ≥ 2.

Remarks • P is right-continuous.

• P is defined for all x ∈ R, while X ∈ {0, 1, 2}.

11 Cumulative distribution functions

Identity of CDFs Let X have CDF P and let Y have CDF Q. If P (x) = Q(x) for all x, then P(X ∈ S) = P(Y ∈ S) for all events S ∈ S.

Properties of CDFs

A function P : R → [0, 1] is a CDF for some probability P, if and only if P satisfies the following conditions

(1) P is non-decreasing: x1 < x2 implies that P (x1) ≤ P (x2).

(2) P is normalized: limx→−∞ P (x) = 0 and limx→∞ P (x) = 1.

+ + (3) P is right-continuous: P (x) = P (x ) for all x, where P (x ) := limy→x,y>x P (y).

12 Random variables • Definition and notation • Cumulative distribution functions • Probability mass and density functions

13 Probability mass and density functions

Definition (Probability mass functions, discrete random variables)

A random variable X is discrete, if it takes on countably many values in X :=

{x1, x2, ...}. The probability mass function of X is defined as

p : X → [0, 1], x 7→ p(x) := P(X = x). (8)

Remarks • A set is countable, if it is finite or bijectively related to N. • A PMF is non-negative: p(x) ≥ 0 for all x ∈ X . • P A PMF is normalized: i p(x) = 1. • The CDF of a PMF is P (x) = (X ≤ x) = P p(x ). P xi≤x i • The CDF of a PMF is also referred to as a cumulative mass function (CMF).

14 Probability mass and density functions Example (Bernoulli random variable)

Let X be a random variable with outcome set X = {0, 1} and probability mass function p : X → [0, 1], x 7→ p(x) := µx(1 − µ)1−x for µ ∈ [0, 1]. (9)

Then X is said to be distributed according to a with parameter µ ∈ [0, 1], for which we write X ∼ Bern(µ). We denote the probability mass function of a Bernoulli random variable by

Bern(x; µ) := µx(1 − µ)1−x. (10)

Remarks • A Bernoulli random variable can be used to model a single biased coin flip with outcomes “failure” 0 and “success” 1. • µ is the probability for X to take the value 1,

1 1−1 P(X = 1) = µ (1 − µ) = µ. (11)

15 Probability mass and density functions

Definition (Probability density functions, continuous random variables) A random variable X is continuous, if there exists a function

p : R → R≥0, x 7→ p(x) (12) such that • p(x) ≥ 0 for all x ∈ R, • R ∞ −∞ p(x)dx = 1, • R b P(a ≤ X ≤ b) = a p(x) dx for all a, b ∈ R, a ≤ b.

Remarks • R a PDFs can take on values larger than 1 and P(X = a) = a p(x) dx = 0. • are obtained from PDFs by integration, • (Probability) mass = (probability) density × (set) volume. • R x d The CDF of a PDF is P (x) = −∞ p(ξ) dξ, thus p(x) = dx P (x). • The CDF of a PDF is also referred to as cumulative density function.

16 Probability mass and density functions

Example (Gaussian random variable, standard normal variable)

Let X be a random variable with outcome set R and probability density function   1 1 2 p : R → R>0, x 7→ p(x) := √ exp − (x − µ) . (13) 2πσ2 2σ2 Then X is said to be distributed according to a Gaussian distribution with parameters 2 2 µ ∈ R and σ > 0, for which we write X ∼ N µ, σ . We abbreviate the PDF of a Gaussian random variable by 1  1  N x; µ, σ2 := √ exp − (x − µ)2 . (14) 2πσ2 2σ2 A Gaussian random variable with µ = 0 and σ2 = 1 is said to be distributed according to a standard and is often referred to as a Z variable. Remarks • The parameter µ specifies the location of highest probability density. • The parameter σ2 specifies the width of the distribution.   • The term √ 1 is the normalization constant for exp − 1 (x − µ)2 . 2πσ2 2σ2

17 Probability mass and density functions Example (Uniform random variables) Let X be a discrete random variable with a finite outcome set X and probability mass function 1 p : X → R≥0, x 7→ p(x) := . (15) |X | Then X is said to be distributed according to a discrete uniform distribution, for which we write X ∼ U(|X |). We abbreviate the PMF of a discrete uniform random variable by 1 U(x; |X |) := . (16) |X |

Similarly, let X be a continuous random variable with probability density function

( 1 b−a x ∈ [a, b] p : R → R>0, x 7→ p(x) := (17) 0 x∈ / [a, b]

Then X is said to be distributed according to a continuous uniform distribution with parameters a and b, for which we write X ∼ U(a, b). We abbreviate the PDF of a continuous uniform random variable by 1 U(x; a, b) := . (18) b − a

18 Probability mass and density functions

Properties of cumulative density functions

• P(X > x) = 1 − P (x) (Exceedance distribution function) • P(x < X ≤ y) = P (y) − P (x) (Interval probability) • With the properties of the Riemann integral, we have

P (y) − P (x) = P(x < X < y) = P(x ≤ X < y) (19) = P(x < X ≤ y) = P(x ≤ X ≤ y).

19 Probability mass and density functions

Definition (Inverse cumulative distribution function)

Let X be a random variable with CDF P . Then the inverse cumulative distri- bution function or function of X is defined as

−1 −1 P : [0, 1] → R, q 7→ P (q) := inf{x|P (x) > q} (20)

If P is invertible, i.e., strictly increasing and continuous, then P −1(q) is the unique real number x such that P (x) = q.

Remarks • P −1(0.25) is called the first . • P −1(0.50) is called the or second quartile. • P −1(q) is also referred to as qth .

20 Probability mass and density functions Example (CDF and inverse CDF for Gaussian random variables)

Let X be a univariate Gaussian random variable with expectation parameter µ and parameter σ2. Then, X has • probability density function   1 1 2 p : R → R, x 7→ p(x) := √ exp − (x − µ) , 2πσ2 2σ2 • cumulative density function

1 Z x  1  P : → [0, 1], x 7→ (X ≤ x) = √ exp − (ξ − µ)2 dξ, R P 2 2 2πσ −∞ 2σ

• and inverse cumulative density function

−1 −1 P : [0, 1] → R, q 7→ P (q) = {x ∈ R|P (x) = q}.

Remark • Let µ = 1, σ2 = 1. Then p(2) = 0.24, P (2) = 0.84, and P −1(0.84) = 2.

21 Example (CDF and inverse CDF for standard normal variables)

Let Z be a standard normal variable. Then, Z has • probability density function   1 1 2 φ : R → R, z 7→ φ(z) := √ exp − z , 2π 2

• cumulative density function

Z x   1 1 2 Φ: R → [0, 1], z 7→ Φ(Z ≤ z) = √ exp − ξ dξ, 2π −∞ 2

• inverse cumulative density function

−1 −1 Φ : [0, 1] 7→ R, q 7→ Φ (q) = {z ∈ R|Φ(z) = q}

Examples • φ(1.645) = 0.102, Φ(1.645) = 0.950, Φ−1(0.950) = Φ−1(1 − 0.050) = 1.640. • −1 −1 0.050 φ(1.960) = 0.058, Φ(1.960) = 0.975, Φ (0.975) = Φ (1 − 2 ) = 1.960.

22