Chapter 4

Commonly Used Distributions

4.1 §3.4

Bernoulli trial: the simplest kind of random trial (like coin tossing)

♠ Each trial has two outcomes: S or F.

♠ For each trial, P (S) = p .

♠ The trials are independent.

76 ORF 245: Common Distributions – J.Fan 77 th Associated r.v.: let Yi be the of i trial. Its takes only two values, {0, 1}, with pmf P (Yi = 1) = p and P (Yi = 0) = 1 − p, called Bernoulli dist. Then, E(Yi) = p and var(Yi) = pq.

Generic name: ♣Quality control: Defective/Good; ♠Clinical trial: Survival/death; disease/non- disease; cure/not cure; effective/ineffective; ♣sampling survey: male/female; old/young; Coke/Pepsi;

♠online ads: click/not click ....

Example 4.1 Which of following are Bernoulli trials?

♠ Roll a die 100 times and observe whether getting even or odd spots.

♠ Observe daily temperature and see whether it is above or below ORF 245: Common Distributions – J.Fan 78 freezing temperature.

♠ Sample 1000 people and observe whether they belong to upper, middle or low SES class.

Definition: Let X be the number of successes in a with n trials and of p. Then, the dist of X called a Binomial distribution, denoted by X ∼ Bin(n, p).

Fact: Let X ∼ Bin(n, p), then n x n−x • (pmf) b(x; n, p) = P (X = x) = x p q . • (summary) EX = np and var(X) = npq.

• (sum of indep. r.v.) X = Y1 + ··· + Yn, FYi = a Bernoulli r.v. ORF 245: Common Distributions – J.Fan 79

Binomial with n= 8 p= 0.5 Binomial with n= 25 p= 0.5 Binomial with n= 50 p= 0.5 0.15 0.25 0.20 0.08 0.10 0.15 y y y 0.10 0.04 0.05 0.05 0.00 0.00 0.00

0 2 4 6 8 0 5 10 15 20 25 10 15 20 25 30 35 40

x x x Binomial with n= 8 p= 0.3 Binomial with n= 25 p= 0.3 Binomial with n= 60 p= 0.3 0.30 0.15 0.08 0.20 0.10 y y y 0.04 0.10 0.05 0.00 0.00 0.00

0 2 4 6 8 0 5 10 15 20 25 5 10 15 20 25 30

x x x Binomial with n= 8 p= 0.1 Binomial with n= 25 p= 0.1 Binomial with n= 100 p= 0.1 0.4 0.25 0.12 0.20 0.3 0.08 0.15 y y y 0.2 0.10 0.04 0.1 0.05 0.0 0.00 0.00

0 2 4 6 8 0 5 10 15 20 25 0 5 10 15 20

x x x

Figure 4.1: Bionomial distributions with different parameters. ORF 245: Common Distributions – J.Fan 80 Example 4.2 Genetics

According to Mendelian’s theory, a cross fertilization of related species of red and white flowered plants produces offsprings of which 25% are red flowered plants. Crossing 5 pairs, what is the chance of

♣ no red flowered plant?

Let X = # of red flowered plants. Then, X ∼ Bin(5, 0.25).

P (X = 0) = q5 = 0.755 = 0.237.

♣ no more than 3 red flowered plants?

P (X ≤ 3) = 1 − P (X ≥ 4) = = 0.984. ORF 245: Common Distributions – J.Fan 81

Use dbinom, pbinom, qbinom, rbinom for density (pmf), probability (cdf), quantile, and random numbers. > dbinom(0,5,0.25) #pmf P(X = 0) or density for discrete r.v. [1] 0.2373047 > pbinom(3,5,0.25) #cdf P(X <= 3) [1] 0.984375 > dbinom(3,5,0.25) #pmf P(X = 3) [1] 0.08789063

4.2 Hypergeometric and Negative Binomial Distributions §3.5

In a sampling survey, one typically draws w/o replacement.

n draws w/o replacement Hypergeometric dist: M defect N-M Good X defect n-X Good (Use hyper in R: e.g. Total N Total n

Figure 4.2: Illustration of sampling without replacement.

1 ORF 245: Common Distributions – J.Fan 82 dhyper, phyper, qhyper, rhyper) MN−M P (X = x) = x n−x , N n for max{0, n − N + M} ≤ x ≤ min(n, M). Remarks ♠ When sampling with replacement, the distribution is binomial. ♠ When sampling a small fraction from a large population, like a poll, the trials are nearly Bernoulli. E.g. N = 40000 and n = 1000, # of families with income ≥$50K is 40%. 15999 15001 P (S |S ) = ≈ P (S ).P (S |S ··· S ) = ≈ P (S ). 2 1 39999 2 1000 1 999 39001 1000

N − n Fact: EX = np, var(X) = npq, where p = M . N − 1 N |corr{z factor} ORF 245: Common Distributions – J.Fan 83 Example 4.3 Capture-recapture to estimate population size

Suppose that there are N animals in a region. Capture 5 of them, tag them and then release them back. Now among newly 10 captured animals, X of them are tagged. Probabilistic question: If N = 25, how likely is it that X = 2? The distribution of X is “hyper” with N = 25, M = 5, n = 10. 520 P (X = 2) = 2 8 = .385. 25 10 > dhyper(2,5, 20, 10) #pmf P(X = 2), N-M = 20 [1] 0.3853755 > phyper(2,5, 20, 10) #cdf P(X <= 2) [1] 0.6988142

Statistical question: What is the population size if we observe 3 ORF 245: Common Distributions – J.Fan 84 tagged animal? Thus, the realization of X, denoted by x, is 3. Since 5 EX = nN and an observed x is a reasonable estimate of EX, then 5 5n x = n or Nb = = 16.667 or 17. N x Definition: In a Bernoulli trial w/ P (S) = p, let Y be the number of draws required to get r successes and X = Y − r (so that it begins from 0). The distribution of X is called a negative Binomial (Use nbinom in R, namely dnbinom, pnbinom, qnbinom, rnbinom): x + r − 1 P (X = x) = prqx, x = 0, 1, 2, ··· r − 1 Y z }| { Y1 Y2 Y3 Yr zFFS}| { FFFFFFFSz }| { FFFSz }| { ··· FFFFFSz }| {

When r = 1, Y follows a :

P (Y = y) = pqy−1, y = 1, 2, ··· .

2 Recall that EYi = 1/p, var(Yi) = q/p . Then,

F Y = Y1 + ··· + Yr and X = Y − r. ORF 245: Common Distributions – J.Fan 85

2 F EY = r/p, var(Y ) = rq/p .

2 F EX = EY − r = rq/p, var(X) = rq/p .

4.3 §3.6

Useful for modeling number of rare events (# of traffic accidents, # of houses damaged by fire; # of customer arrivals to a service center). When is the Poisson distribution (Use pois in R) reasonable? • independence: # of events occurring in any time interval is independent of # of events in any other non-overlaping interval;

• rare: almost impossible for two or more events simultaneously;

• constant rate: The average # of occurrences per time unit is ORF 245: Common Distributions – J.Fan 86 constant, denoted by λ.

Poission dist with lambda = 0.5 Poission dist with lambda = 1 0.6 0.3 0.4 0.2 dpois(x, 1) 0.2 dpois(x, 0.5) 0.1 0.0 0.0

0 5 10 15 20 25 30 0 5 10 15 20 25 30

(a) (b) Poission dist with lambda = 7 Poission dist with lambda = 15 0.15 0.08 0.10 0.04 dpois(x, 7) 0.05 dpois(x, 15) 0.00 0.00 0 5 10 15 20 25 30 0 5 10 15 20 25 30

(c) (d)

Figure 4.3: The Poisson Distribution (line diagram) for different values of λ

Let X = # of occurrences in a unit time interval. It can be shown that the distribution Poisson(λ) is given by: e−λλx P (X = x) = , x = 0, 1, 2, ··· . x! ORF 245: Common Distributions – J.Fan 87 x= 0:30 plot(x, dpois(x,0.5), xlab="(a)", type="n") #lambda = 0.5 for(i in 0:30) lines(c(i,i), c(0,dpois(i, 0.5)),lwd=1.5, col="blue") plot(x, dpois(x,15), xlab="(d)", type="h", col="blue", lwd=1.5) title("Poission dist with lambda = 0.5")

Poisson approximation to the Binomial: When n is large and p is small, and n and 1/p are of the same order of magnitude (e.g., n = 100, p = 0.01; n = 1000, p = 0.003), then: n e−λλx pxqn−x ≈ , with λ = np. x x! Example 4.4 Suppose that a manufacturing process produces 1% of defective products. Com- pute P( # of defectives ≥ 2 in 200 products).

Note that Bin(200, 0.01) ≈ Poisson(2). Thus,

P (X ≥ 2) = 1 − P (X = 0) − P (X = 1) e−220 e−22 = 1 − − = 0.594 0! 1!

Direct calculation shows that the probability is 0.595. ORF 245: Common Distributions – J.Fan 88 and variance: EX = λ and var(X) = λ. (Think of the Poisson approximation to the Binomial).

4.4 The §4.3

Definition: A cont rv X has a normal dist (”dnorm, pnorm, qnorm, rnorm ” for density, probability (cdf), quantile, and random number generator) if its pdf is given by ! 1 (x − µ)2 f(x; µ, σ) = √ exp − , −∞ < x < ∞. 2πσ 2σ2 denoted by X ∼ N(µ, σ2). Facts: It can been shown that EX = µ and var(X) = σ2.

Standard normal distribution refers to the specific case in ORF 245: Common Distributions – J.Fan 89

Figure 4.4: Normal - bell-shaped - distributions with different means (center) and variances (spreadness). which µ = 0 and σ = 1. Let

1 2 Z z φ(z) = √ e−z /2 and Φ(z) = φ(u)du 2π −∞ be the density and the cumulative distribution functions of the standard normal distribution (z-curve).

Standardization: Suppose X ∼ N(µ, σ2). Then, Z = (X − µ)/σ transforms X into standard units. It says how many SDs x is ORF 245: Common Distributions – J.Fan 90

Figure 4.5: The density and the cdf. above or below the mean µ. Indeed, Z ∼ N(0, 1). Computation of the area under the normal curve: Putting endpoints into SUs and compute the area under z-curve. a − µ b − µ P (a ≤ X ≤ b) = P ≤ Z ≤ σ σ b − µ a − µ = Φ − Φ σ σ ORF 245: Common Distributions – J.Fan 91

Example 4.5 Finding areas under the normal curve.

(a) If X ∼ N(3, 22), compute P (1 ≤ X ≤ 6).

P (1 ≤ X ≤ 6) = = 6 − 3 1 − 3 = Φ − Φ = Φ(1.5) − Φ(−1) 2 2 = .9332 − .1587 = 77.45%

(b) If X ∼ N(−5, 62), find P (X ≥ −2).

P (X ≥ −2) = = −2 + 5 = Φ(∞) − Φ = 1 − Φ(0.5) 6 = 1 − .6915 = 30.85% ORF 245: Common Distributions – J.Fan 92

(c) If X ∼ N(0, σ2), find P (−σ ≤ X ≤ σ).

P (−σ ≤ X ≤ σ) = = P (−1 ≤ Z ≤ 1)

= .8413 − .1587 = 68.26%

(d) If X ∼ N(0, σ2), find P (−2σ ≤ X ≤ 2σ).

P (−2σ ≤ X ≤ 2σ) = =

= .9772 − .0228 = 95.44%. 0.4 .3 0 y 0.2 .1 0 0.0

-3 -2 -1 0 1 2 3 x

Figure 4.6: under the z-curve (standard normal curve). ORF 245: Common Distributions – J.Fan 93 Example 4.6 Suppose that the arrival time of a passenger to a train station is normally distributed with mean 11:55am and SD 10 minutes. If the train is scheduled to leave at 12:03pm, what is the chance that the passenger will catch the train?

Let X be the arrival time relative to 12:00pm. Then, X ∼ N(−5, 102). The probability is

P (X < 3) = = 78.81%.

Quantile of normal distribution: Let zα be the value such that

P (Z ≥ zα) = α, the upper α percentile. It can be found through qnorm. For example, ORF 245: Common Distributions – J.Fan 94

Table 4.1: Upper quantile of normal distribution

α 0.0005 0.001 0.005 0.01 0.025 0.05 0.10

zα 3.291 3.090 2.576 2.326 1.96 1.645 1.282

4.5 Normal approximation to a data histogram

Normal approx: Use prob histogram to approx data histogram: Approximate a data histogram with average µ and SD σ by a normal curve N(µ, σ2). valid when the data histogram follows a bell curve. Sufficient summary: From n data values we can summarize into two numbers: x¯ (mean) and s (SD). From these two numbers, we can infer all useful information about the original data. Thus, for ORF 245: Common Distributions – J.Fan 95 bell-shaped data histograms, x¯ and s are sufficient .

Example 4.7 Normal approximation

The blood pressures of the users aged 25 to 34 in a drug study averaged out to 121mm with an SD of 12.5mm. The data histogram follows a normal curve.

1. Approximately, what percentage of users have blood pressure be- ORF 245: Common Distributions – J.Fan 96 tween 96mm and 133.5mm? 133.5 − 121 96 − 121 P (96 ≤ X ≤ 133.5) = Φ − Φ 12.5 12.5 = = .8413 − .0228 = 81.85%

2. If the top 10% of users need to be singled out for further investi- gation, what is the cut-off point?

(a) Convert the percentile into the standard units (SU):

upper 10th percentile = 90th percentile = z.10 = 1.28 SU (b) Convert the standard unit into the original unit: 1.28 SU ←→ = 137mm ORF 245: Common Distributions – J.Fan 97

Example 4.8 Normal approximation and percentile

The IQ in a particular population (as measured by a standard test) is known to be approximately normally distributed with mean 100 and SD 15.

1. What percentage of the population has an IQ at least 125? Let X ∼ N(100, 152). The percentage is given by:

P (X ≥ 125) = = 4.78%

2. What is the 95th percentile of the population? 95th percentile = 1.96 SU ←→ = 129.4

4.6 The exponential and distributions §4.4

Definition: A continuous rv X has an (use exp in R) if its pdf is given by

f(x; λ) = λ exp(−λx), x ≥ 0. ORF 245: Common Distributions – J.Fan 98 Facts: ♠ F (x; λ) = P (X ≤ x) = 1 − exp(−x/λ), x ≥ 0. ♠ EX = λ and var(X) = λ2. ♠ memoryless property: P (X > a + x) exp(−λ(a + x)) P (X > a + x|X > a) = = P (X > a) exp(−λa) = exp(−λx) = P (X > x).

Gamma distribution: Model lifetime without memoryless prop- erties, income distributions, etc. (Use gamma in R) 1 f(x; α, λ) = xα−1 exp(−x/λ), for x ≥ 0 λαΓ(α) ♠α > 0 —shape parameter, ♠λ > 0 —scale parameter. ♠α = 1 =⇒ exponential dist, i.e. Gamma(α, λ) = Exp(λ) ORF 245: Common Distributions – J.Fan 99

Figure 4.7: Left panel: Gamma densities with λ = 1 and α = 0.6, 1, 3. Right panel: Gamma densities with α = 3, λ = 0.5, 1, 2.

4.7 Probability (Q-Q) plots §4.6

Purpose: To check whether the data x1, ··· , xn follow a dist F .

Q-Q plot: Let x(1) ≤ x(2) ≤ · · · ≤ x(n) be sorted data, called the ORF 245: Common Distributions – J.Fan 100 order statistics. Then x(i) is roughly the (i − 0.5)/n (empirical) percentile. The Q-Q plot shows

{(theoretical quantiles, empirical quantiles)}  i − 0.5  = F −1 , x n (i) Check if the plot looks reasonably straightline. −1  Normal probability plot: display {(Φ ((i − 0.5)/n), x(i))}.

 Extending to compare distributions of two data sets. x = seq(-4, 4, 0.02) #generate plots plot(x,dunif(x,-1,1),col="red",type="l",lwd=2) #uniform density lines(x,dnorm(x), col="blue", lwd=2) #normal density lines(x,dgamma(x+3,3,1), col="black",lwd=2) #shifted Gamma density title("Uinf(-1,1), N(0,1), Gamma(3,1)-3") x <- runif(400,-1,1) #uniform random numbers qqnorm(x, col="blue", main="") #checking normality, taking default title away qqline(x, lwd=2, col="red") #add a line connecting Q_1 and Q3 ORF 245: Common Distributions – J.Fan 101

Uinf(−1,1), N(0,1), Gamma(3,1)−3 (a) unif(0,1): lighter (b) N(0,1): about right (c) gamma(3,1): heavier

●●●● ● ● ● ● ●

● 8 1.0 ● 0.5 ●●● ● ●●●●●● ●● ●●● ● ●

2 ● ●●●● ●● ●● ●●● ● ●●●● ●●●● ●● ● ●●● ●●●● ● ●●● ●● ●● ●●● ●● ●●● ●● ●● ● ● ● 6 ●● ●● 0.4 ●● ●● ●●● ●●● 1 ●● ●● ● ●●● ● 0.5 ● ● ●● ●● ●● ● ●● ●● ●● ●●● ●●● ●●● ●● ●●●● ●● ●●● ●● ●●● ● ●● ●●● ●● ● ●● ●● ● ● 4 ●● ●● ● ● ●● ● ●● ●●● 0.3 ● ●● ● 0 ●● ●● ● ●● ●●● ● ● ●●● ●● ● ●●● ●● ● ●●● ● ●● ●● ●●● ● ●●● ●● 0.0 ● ● ● ●● ● ●● ●●●● ●● ●● ●●● ● ● ● 2 ● ●● ●●● ● ●● ● ●● ●● ●● dunif(x, −1, 1) ● ●● ● ● −1 ● 0.2 ● ●● ● ● ●●● ●● Sample Quantiles ● Sample Quantiles Sample Quantiles ● ●● ●●● ● ●●●● ●●● ●● ● ●● ● ●●● ●● ●● ●● ●●● ● ● ●● ●● ●● ●●● ● ●●● ●● ●● ●● 0 ●● ●● ●● ●●●● ● ●●● ●●● −0.5 ● ● ● ●● ●● ●●● ● −2 ●●● ●● ● ●●● 0.1 ●● ●●● ● ●●● ● ●● ●●●●● ● ●●●●● ●● ●●●● ● ● ●●●●●●● ●●● ●●●●● ●●● ●●●●●●● ●● ● ●●●● ●●● −2 ●●● ●●●● ●●●●●●●● ●●●●●●●● ●●● ●●●●● −3 ●● ● ● ●●● ● ● ● 0.0 −1.0

−4 −2 0 2 4 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3

x Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles

Figure 4.8: Normal probability plot for 400 data points generated from: (a) uniform(-1, 1) distribution (light tail); (b) N(0,1) distribution (correct); (c) Gamma(3, 1) − 3 (heavier tail). title("(a) unif(0,1): lighter") x <- rnorm(400) #Normal random numbers qqnorm(x, col="blue", main=""); qqline(x, lwd=2, col="red") title("(b) N(0,1): about right") x <- rgamma(400, 3, 1) -3 #Shifted normal numbers qqnorm(x, col="blue", main=""); qqline(x, lwd=2, col="red") title("(c) gamma(3,1): heavier")

Example 4.9 Tails of Financial Returns.

We now consider the financial returns of SP500 and IBM, as well as SP500 before and during the 2008 financial crisis. ORF 245: Common Distributions – J.Fan 102

(a) SP500: heavier (b) rSP500 against t(3.5) (c) SP500 before & after criss (d) Returns of IBM against SP500

10 ●● ● 10 ● ● ● 10 ● ● ● ●

● ● ● 15 ●● ●● ● ●● ●● ● ● ● ●● ● ●● ●●● ●●● ●● ●● ●● ● 5 ● 5 ● 5 ●● ●●● ●●● ●● ●● ●●● ●● ●● ●● 10 ●●● ●●● ●●●● ● ● ●● ●● ●● ● ●●●● ●●● ●●● ●●● ●● ● ● ●●●●● ●●● ●●● ●● ●●●● ●● ●●● ●● ●●●●● ●● ●● ●●● ●●●●●● ●●● ●● ●●●●●●● ●●●●●● ●● ●●●● ●●● ●●●●●● ●● ●● ●●●● ● ● 5 ●●●●●●● ●● ● ●●●● ●●●●●●●● ●●● ●●● ● ●●●●●●●●●●● ●●● ●●● ●●● 0 ● 0 0 ●●●●●●●●●● ●●● ●● ●●● ●●●●●●●●●● ●●● ●●● ●●● ●●●●●●●● ●●● ●●●● ●●● ●●●●●●● ●●● ●● ●●● ●●●●●● ●●● ●●● ●●● ●●●●● ●●● ●●●● rIBM ●●● ●●●●● ●●● ● ●●●● ●●● rSP500 ●● ●● ●●● ●●●● ●●● rSP500a ●● ●●●● ●● ●● ●● ●●● ● ● 0 ● ●●● ●● ●● ●●● ●●● ●● ●● ●●● ●● ●● ● ●●●● ●●●● ●●●● ●●●● ●●● Sample Quantiles ● ● ● ●● ●●● ●●● ●● ●●● ●●● ●●● ● ●●● ● ● ●●● ●● ●●● ●● ●●● −5 −5 −5 ● ● ● ●●●● ●●●● ●●● ●●● ● ● ●●● ●● ●● ● ● ● ● −5 ●● ● ●●● ●● ●●● ● ● ● ● ● ●● −10 −10 −10 −10 ● ● ● ● ● ●

−2 0 2 −15 −10 −5 0 5 10 15 −3 −2 −1 0 1 2 3 −10 −5 0 5 10

Theoretical Quantiles qt4 rSP500b rSP500

Figure 4.9: Comparisons of distributions of financial returns. pdf("FinReturns.pdf", width=10, height=3, pointsize=8) #print to pdf file par(mfrow = c(1,4), mar=c(4,4,4,1)+0.1, cex=0.8) qqnorm(rSP500, col="blue", main="") #Checking normality qqline(rSP500, lwd=2, col="red") #Add a line title("(a) SP500: heavier") #Add a title x = (1:length(rSP500)-0.5)/length(rSP500) #compute percents qt4 = qt(x,3.5) #compute theoretical quantiles of t(3.5) res=qqplot(qt4, rSP500, col="blue", main="") #draw the QQplot and save the results x=res$x #extract x component of the results y=res$y #extract y component ORF 245: Common Distributions – J.Fan 103 abline(lsfit(x,y), col="red", lwd=2) #draw a regression line through the data title("(b) rSP500 against t(3.5)") #add title res=qqplot(rSP500b, rSP500a, col="blue", main="") #compare dist of two data sets and save results x=res$x y=res$y abline(lsfit(x,y), col="red", lwd=2) # draw a regression line title("(c) SP500 before & after criss") res=qqplot(rSP500, rIBM, col="blue", main="") x=res$x y=res$y abline(lsfit(x,y), col="red", lwd=2) title("(d) Returns of IBM against SP500")

Definition: The t-distribution (Use t in R) with degree of freedom ν has the density !−(ν+1)/2 x2 f (x) = d−1 1 + ν ν ν ORF 245: Common Distributions – J.Fan 104 √ where dν = B(0.5, 0.5ν) ν. It has mean 0 and variance ν/(ν − 2).

t−distributions with different dfs 0.4

A 5%-event under normal distri- 0.3 0.2 bution occurs in 14000 years (= density 0.1

1/pnorm(−5)/252)and under t10- 0.0 −5 0 5 distribution in 15 years and under t4.5 Figure 4.10: Densities of tν with ν = 1, 3, 5, 10 and ∞. is 1.5 years (= 1/pt(−5, 4.5)/252). Note: pnorm(−5) = 2.8665 × 10−7. There are 252 trading days.