Chapter 5, Probability Distributions

5.1 Introduction - In this chapter, we will discuss various probability distributions including discrete probability distributions and continuous probability distributions.

- Discrete probability distributions is used when the sampling space is discrete but not countable. Following is a list of discrete probability distributions:  discrete uniform  binomial and multinomial  hypergeometric  negative binomial  geometric  Poisson

- Continuous probability distribution is used when the sample space is continuous. Following is a list of continuous probability distributions:  Uniform  Normal (or Guassian)  Gamma  Beta  t distribution  F distribution  2 distribution

5.2 Discrete uniform distribution - the definition: if a r. v., X, assumes the values x1, x2, ..., xk with equal probabilities, then X conforms discrete uniform distribution and its probability function is given below:

1 f ( x , k )  , x  x , x , . . . , x k 1 2 k

- the mean and variance:

1 k    x i k i 1 k 2 1 2    ( x i   ) k i 1

5.3 Binomial and multinomial distributions - First, let us introduce the Bernoulli process. If:  the outcomes of process is either success (X = 1) or fail (X = 0)  the probability of success is P(X = 1) = p and the probability of fail is P(X = 0) = 1-p = q Then, the process is a Bernoulli process.

- The probability distribution of the Bernoulli process: p(x) = px(1 - p)1-x, x = 0, 1 and 0 < p < 1

- The mean and the variance: E(X) = p V(X) = p(1 - p)

- An example: what is the prob. of picking a male student? X = 1: male student with probability p = (8/12) = 2/3 X = 0: female student with probability 1-p = 1/3 Thus, the probability distribution is: P(x) = (0.25)x(0.75)1-x, x = 0 and 1

In addition, the mean: p = 2/3 and the variance V = (2/3)(1/3) = 2/9

- Binomial Distribution: the binomial distribution is defined based on the Bernoulli process. It is made up of n independent Bernoulli processes. Suppose that X1, X2, ...,

Xn are independent Bernoulli random variables, then Y =  Xi will conform Binomial distribution. (note that Y is the number of successes among the n trails)

- The probability distribution of binomial distribution is:  n y ny P(Y  y)    p (1 p) , y  0,1,...,n  y

- The student example: pick three students from the 12 students (Note we must take samples with replacement in order to ensure the same probability and independence). none is male student from the 3: the possibility: FFF 3   3 the probability:   (1-p) = (0.037) 0 one is male student from the 3: the possibility: MFF, FMF, FFM 3   2 the probability:   3p(1-p) = (0.222) 1 two are male students from the 3: the possibility: MMF, MFM, FMM 3   2 the probability:   3p (1-p) = (0.445) 2 three are male students from the 3: the possibility: MMM 3   3 the probability:   p = (0.296) 3 In general, the formula is:

We can derive the general formula in a same manner.

- Mean and variance of the binomial distribution:

E(Y) =  E(Xi) = p = np

V(Y) =  V(Xi) = p(1 - p) = np(1 - p)

- the example: find the mean and variance of picking male students and then use Chybeshev's theorem to interpret the interval  ± 2.  = (3)(2/3) = 2  = (3)(2/3)(1/3) = 2/3,  = 0.817

at k = 2,  + 2 = 2 + (2)(0.816) = 3  - 2 = 2 - (2)(0.816) = 1

(1 - 1/k2) = 3/4. Therefore, there should be at least a probability of 3/4 that the number of male students picked are between 1 to 3. Indeed, the probability is actually p(1)+p(2)+p(3) = 0.973.

- Using the Binomial distribution table: a function of n and p.

- Multinomial distribution: this is an extension of binomial distribution: let x1, x2, ..., xk be independent r. v. with the probability p1, p2, ..., pk, where,

then, they conform multinomial distribution with the probability distribution:

5.4 Hypergeometric Distribution - The example: what is the probability of pick three male students in a roll? Note that at this time, samples are not independent, or sampling without replacement. As a result we need to use hypergeometric distribution. Following shows how the distribution is formed:  no male student from the 3 students 12 8 4 total   , male   , female    3  0 3 84    03 probability = 12    3   one male students from the 12 students 12 8 4 total   , male   , female    3  1 2 84    12 probability = 12    3   two male students from the 12 students 12 8 4 total   , male   , female    3  2 1 84    21 probability = 12    3   three male students from the 12 students 12 8 4 total   , male   , female    3  3 0 84    30 probability = 12    3  In general, the probability distribution is as follows: 8 4    y 3  y  P ( Y  y )  , y  0 , 1 , 2 , 3 1 2   3 

- the general formula of the hypergeometry distribution:

k N k     y n  y  P ( Y  y )  , y  0 , 1 , 2 , . . . , n N   n 

- the mean and the variance of the hypergeometry distribution: n k   N N  n n k  k  2  1   N  1 N  N 

as a special case, let N be infinite, then (k / N) = p, and (N-n) / (N-1) = 1. Hence:  = np 2 = np(1 - p)

That is, the hypergeometric distribution becomes the binomial distribution

- We can also define the multivariate hypergeometric distribution

5.5 Negative Binomial and Geometric Distributions - An example: picking three students, what is the probability that the third student is the second male?  a possibility is FMM and its probability is (1-p)p2  the other possibility is MFM and its probability is (1-p)p2

3  1  note that there are  combinations, and hence, the probability is: 2  1  3  1  f ( X  3 , k  2 )   1  p p 2 2  1 

- The general formula for the negative binomial distribution is as follows:

x  1  k x k f ( X  x )   p ( 1  p ) , x = k, k+1, k+2, ... k  1 

where, x is the number of trails and k is the kth success.

- the mean of variance of the negative binomial distribution: E(X) = k(1-p)/p V(X) = k(1-p)/p2

- another example: picking until get a male student:  the first pick: p  the second pick: (1-p)p  the third pick: (1-p)2p

- the general formula is: f(X = x) = (1 - p)x-1p, x = 1, 2, 3, ...

This is the geometric distribution.

- the mean of variance of the negative binomial distribution and geometric distributions: E(X) = 1/p V(X) = (1-p)/p2

5.6 Poisson Distribution - Poisson process is a random process representing a discrete event takes place over continuous intervals of time or region. Examples of Poisson processes include:  the arrival of telephone calls at a switchboard,  the passing cars of an electric checking device.

Note that all these examples involve a discrete random event. At any given small period of time (or region), the probability that the event occurs is small; however, over a long time (or large region), the number of occurrence is large.

- Poisson distribution plays an extremely important role in science and engineering, since it represents an appropriate probabilistic model for a large number of observational phenomena.

- The Poisson distribution can be described by the following formula:

e t ( t ) x p ( x , t )  , x = 0, 1, 2, ... x !

where,  is the average number of outcomes per unit time or region. Hence, t represents the number of outcomes.

Proof: refer to the textbook.

- The Poisson process can be considered as an approximation to the Binomial Distribution when n is large and p is small.

- From a physical point of view, given a time interval of length T, which is divided interval into n equal sub-intervals of length t (t  0), (note that T = nt), and assume:  The probability of a success in any sub-interval t is given by t.  The probability of more than one success in any sub-interval t is negligible.  The probability of a success in any sub-interval does not depend on what happened prior to that time.

Then, we have the Poisson distribution.

- Mean and Variance of Poisson distribution



- An example: in a large company, industrial accidents occur at the mean of three per week (t = 3) (note that accidents occurs independently).  the probability distribution: p(y) = (3)yexp(-3) / y!, y = 0, 1, 2, ...  the probability can be determined based on simple calculation or by means of checking the Poisson distribution table.

 the probability of less than and equal to four accidents in a week: p(0) + p(1) + p(2) + p(3) + p(4) = 0.815

 the probability of equal and more than four: P(Y  4) = 1 - P(Y  3) = 0.353

 the probability of equal to four P(Y = 4) = P(Y  4) - P(Y  3) = 0.168 note that this is the same as: p(4) = 0.168

5.7 Uniform Distribution - The uniform distribution is a continuous probability distribution  the assumption: the random event is equally likely in an interval  an example: receiving an express mail between 1 ~ 5 pm

- The probability density function (pdf)

 1  a  x  b f (x)  b  a  0 elsewhere

- By integration, we obtain the probability function (pf)

 0 x  a  x  a F(x)   a  x  b b  a  1 b  x

- A comparison between the discrete distributions and continuous distribution  the discrete r. v., we have probability function: P(X = x) = p(x)  for continuous r. v.: F(X = x) = 0 x F ( x ) =  f ( x ) d x -  F ( x ) f ( x ) = d x

- An example: receiving an express mail equally likely between 1 to 5 pm. f(x) = 1/4, 1  x  5 0, elsewhere

hence, the probability of receiving an express mail between 2 to 5 pm is P(2  X  5) = (5 - 1)/(5 - 1) - (2 - 1)/(5 - 1) = 3/4.

- The mean and the variance: E(x) = (a+b)/2 V(x) = (b-a)2/12

5.8 Normal Distribution - In the natural world there are more cases where possibilities are not equally likely. Instead there is a most likely value and then the likelihood decreases symmetrically. This leads to the Normal distribution.

- Normal distribution is by far the most widely used probability distribution. Why Normal distribution is so popular?  the large number theorem  a linear combination of Normal is still Normal

- The probability density function:

- (x - )2/2 2 f(x) = 1 e   2 

note that probability function does not have analytical form, hence, we rely on numerical calculation (Table A.3)

- The mean, variance and standard deviation of a normal distributions: E(X) =  V(X) = 2

These two parameters uniquely determine the normal distribution. Hence, a normal distribution is often denoted as N(, )

- Illustration of the normal distribution:  the bell shape  the mean  the standard deviation: ± (68% area), ±2 (95.4% area), and ±3 (99.7% area).

- In particular, with E(X) =  V(X) = 2

we have the standard normal distribution N(0, 1) - Calculate the probability through the standard normal distribution:  translate to a normal distribution to a standard normal distribution by: X -  Z = 

 use the normal distribution table (Table A.3)

- An example: given N(16, 1), P(X > 17) = ?  Z = (X - 16)/1  P[Z > (17 - 16)/1] = P(Z > 1) = 1 - P(Z < 1) = 1 - 8413 (form Table A.3) = 0.1587

- Questions:

 given  and , how to calculate P(c1  X  c2)?  given p,  and , how to calculate x so that P(X > x) = p

- Given a set of data, it is often necessary to checking whether the data set conforms normal distribution.

- The student example - the number of hours of study of the 12 students:  sorting the data: 10, 12, 12, 14, 14, 14, 15, 15, 15, 20, 20, 25  note that there are just 6 different values. So, the 100  6 = 16.7  finding the percentile of the data: 16, 32, 32, 48, 48, 48, 64, 64, 64, 80, 80, 96  finding the z-values of the percentile: -1., -.47, -.47, -.05, -.05, -.05, .36, .36, .36, . 85, .85, 1.75  plotting:

2 5 •

2 0 •

1 5 • • 1 0 - 1 . 5 -• 1 - 0 . 5 0 . 5 1 1 . 5 2

 Because the horizontal axis is from a normal distribution, the linear relationship indicates that the distribution of the data can be approximated by a normal distribution. - If a data set conforms normal distribution, then the related probability calculated can be easily done. Following the 12 students example:  = 15.5  = 16 Question: what is the prob. of picking a student who studies at least 15 hours per week? Answer: we first calculate the z value; z = (15 - 15.5) / 4 = -0.125

hence, the probability is: P(Z > -0.125) = 1 - P(Z < -0.125) = 1 - 0.45 = 0.55

- As another example, assuming that an exam is coming, everybody is putting an extra 3 hours for study per week, what is the probability of picking a student who studies at least 20 hours per week? We first calculate the z value; z = (20 - 18.5) / 4 = 0.375

hence, P(X > 20) = P(Z > 0.375) = 1 - P(Z < 0.375) = 1 - 0.64 = 0.36.

- As an exercise, you may want to try to find that, given a probability of 95%, what is the range of the hours of study per week for a picked student.

- Normal approximation to binomial. Assuming p is small and n is large, then

X  n p Z  n p ( 1  p )

is approximately normally distributed. This can be demonstrated by the example. In the students example, the probability of picking a student who studies more than 15 hours per week is p = 3/12 = 1/4. Consider the case of sampling with replacement, picking 3 students who all study more than 15 hours per week is: b(X = 3, n = 12, p = 1/4) = 0.212

Use normal distribution to approximate:  = np = (12)(1/4) = 3 2 = np(1 - p) = (12)(1/4)(3/4) = 9/4 = 2.25 ( = 1.5)

hence, P(2.5 < X < 3.5) = P[(2.5 - 3)/1.5 < Z < (3.5 - 3)/1.5] = P(-0.167 < Z < 0.167) = 0.56 - 0.395 = 0.165

It is seen that the results are rather similar. The approximation error is caused by small n (n = 12). - The normal approximation of binomial distribution is very useful when n is large because binomial distribution will then require tedious calculation.

5.9 Exponential distribution, Gamma distribution and Chi-Square (2) distribution - There are cases, for example the failure rate, in which the possibility decreases exponentially. This leads to the exponential distribution.

- the probability density function of the exponential distributions: 1  x   exp  x  0,   0 f (x)       0 elsewhere

- the probability function

F(x) = 1 - exp(-x/), x > 0,  > 0

- To calculate mean and variance, we need the Gamma () function:  (  ) =  x  - 1 e - x d x 0

using integration by part: (uv)' = u'v + uv' u v   u ' v   u v ' or  u v '  u v   u ' v

let u = x-1, dv = e-xdx, it follows that:    (  )  e  x x  1  e  x (   1 ) x   2 d x  (   1 ) (   1 ) 0 0

In particular: (+1) = F() (n) = (n-1)! (1/2) = 

In general:   x ( x )  1 e  d x    ( ) 0

for the geometry distribution, since  = 1,  = : E(X) =  V(X) = 2 

- The exponential distribution is correlated to Poisson distribution: given a Poisson distribution with the mean t, the probability of first time occurrence is exponential.

- Another common case is that the possibility is low when close to zero - this leads to the Gamma distribution. The probability density function of Gamma distribution: x 1  f (x)  x 1e  , x > 0,  > 0.   

- The mean and variance: E(X) =  V(X) = 2

- Note that exponential distribution is a special case of Gamma distribution with  = 1.

- Another special case of the gamma distribution is the 2 distribution. Let  = /2 and  = 2, it results in the 2 distribution:

 x 1 2 1  2 f ( x )   x e 2  , x > 0 2  ( 2 )

its mean and variance are as follows:  =  2 = 2

- Illustration.

Gamma or 2

Exponential

5.10 Weibull distribution - The assumption: similar to Gamma - The probability density function:

  - 1  f ( x ) = x e - x /  , x > 0  = 0 , o t h e r w i s e

- The probability function: F(x) = 1 - exp(-x/), x > 0

- The mean and variance 1/ E(X) = (1 + 1)    2 /  2 V ( X ) = { ( 1 + 2 ) - [ ( 1 + 1 ) ] }     

- Application in reliability, defining: f(t) - the pdf of failure F(t) - the pf of failure R(t) = 1 - F(t) - the probability of no failure (reliability function) r(t) = f(t) / R(t) - the failure rate function

if: f (t) f (t) 1 r(t)    R(t) 1 F(t) 

then f(t) will be exponential.

- Proof: since dF(t)/dt = f(t)  • F'(t) = 1 - F(t)  • F'(t) + F(t) = 1

solving the above gives: F(t) = 1 - exp(-t/), t  0

or f(t) = 1/ exp(-t/), t  0

5.11 Summary - Discrete distributions  discrete uniform: equally likely  binomial and multinomial: number of success in n independent Bernoulli experiments  hypergeometric: sampling is dependent (finite sampling space)  negative binomial: kth success in n trials  geometric: trail until success  Poisson: discrete event in continuous intervals.

- Continuous distributions  uniform: equally likely  Normal: has a most likely value and decreasing symmetrically  exponential: gradually decreasing  Gamma: small when close to zero (generalized exponential)  Beta: contained in a finite interval  Weibull: generalized Gamma