1 Chapter 3 – Random Variables and Probability Distributions

Defn: An experiment is a test or series of tests in which purposeful changes are made to the input variables of a process or system so that we may observe and identify reasons for changes in the output response.

Defn: A random experiment is one whose outcome cannot be predicted with certainty.

Example: To determine optimum conditions for a plating bath, the effects of sulfone concentration and bath temperature on the reflectivity of the plated metal are studied. Two levels of sulfone concentration (in grams/liter) and five levels of temperature (degrees F) were used, with three replications. (Example from Miller & Freund’s Probability & Statistics for Engineers, by R. A. Johnson). In this case, there are two experimental factors – concentration and temperature – with two levels of concentration and five levels of temperature. The (random) response variable is reflectivity.

Defn: A random variable is a numerical variable whose measured value is determined by chance.

Note: We will denote a random variable with an uppercase letter, such as X, and a measured value of the random variable with a lowercase letter, such as x.

Example: In the experiment described above, reflectivity is affected by concentration and temperature, but it also affected by other factors not explicitly included in the experiment. Hence, reflectivity is a random variable.

A random variable is called continuous if the set of possible values is some interval(s) of the real numbers. A random variable is called discrete if the set of possible values is either finite or countably infinite.

Example: Continuous random variables – electric current, reflectivity (above example), temperature, pressure. Discrete random variables – number of defective parts in a shipment of parts, number of people in a poll who support a particular candidate for President, number of accidents happening at the intersection of Beach Boulevard and Kernan Boulevard in a month.

Probability

There are two primary interpretations of probability:

1) Subjective approach: Probability values are assigned based on educated guesses about the relative likelihoods of the different possible outcomes of our random experiment. This approach involves advanced concepts and principles, such as entropy.

2) Relative frequency approach: In this approach to assigning probabilities to events, we look at the long-run proportion of occurrences of particular outcomes, when the random experiment is performed many times. This long-run proportion tells us the approximate probability of occurrence of each outcome. 2 Example: If we flip a coin once, what is the likelihood that the outcome is a head? Why? For a single coin flip, we cannot say with certainty what the outcome will be. However, if we flip a coin 1,000, 000 times, we are fairly sure that approximately one-half of the outcomes will be heads.

This approach is based on the Law of Large Numbers, which says, in particular, that the relative frequency of occurrence of a particular outcome of a random experiment approaches a specific limiting number between 0 and 1 if we perform the experiment a very large number of times.

Defn: A set is a collection of elements.

Defn: Given a set , another set A is called a subset of , denoted A   , if every element of A is also an element of .

Defn: Given a set , and two sets A   and B   , we define the union of A and B, denoted by A  B , to be the set of all elements of  that are either elements of A or elements of B or elements of both A and B.

Note: If  is the sample space of a random experiment, then A and B are events, and A  B is the event that either A or B (or both A and B) happens when we perform the experiment.

Defn: Given a set , and two sets A   and B   , we define the intersection of A and B, denoted by A  B , to be the set of all elements of  that are elements of both A and B.

Note: If  is the sample space of a random experiment, then A and B are events, and A  B is the event that both A and B happen when we perform the experiment.

Defn: The empty set, or null set, , is the set that contains no elements.

Note: The null set is a subset of every set.

Defn: Two sets A, B are said to be mutually exclusive if A  B   .

Defn: The complement A of a set A   is A  x   : x  A, where is the set of real numbers.

Basic Laws of Probability (Kolmogorov’s Axioms, in terms of a random variable X): 1) PX   1. 2) 0  PX  E  1 , for any E   .

3) If E1, E2, E3, ..., En   are mutually exclusive, then

PX  E1  E2  E3   En   PX  E1  PX  E2  PX  E3   PX  En  Note: Laws 1, 2, and 3 imply the complement rule: For any set E, PE 1 PE. 3 Note: We may generalize Kolmogorov’s Axioms from subsets of real numbers to subsets of the set of all possible outcomes of a random experiment. For example, if our random experiment is to flip a fair coin twice, the set of all possible outcomes, called the sample space of the experiment, is   HH, HT, TH , TT. Any subset of a sample space is called an event. The following 3 laws apply when we consider all 16 subsets of the sample space of this experiment: 1) P  1 , 2) For any A   , 0  PA  1 . 3) If A, B   such that A  B   , then PA  B  PA PB .

Example: From handout.

Example: p. 53, Exercise 3-13.

Continuous Random Variables

Defn: The probability distribution of a random variable X is the set A, PA: A  .

Mathematically, the two types of random variables – continuous and discrete – must be handled differently.

Under certain simple conditions, we may describe the distribution for a continuous random variable using a probability density function. 4 Defn: If the distribution of a continuous random variable has a probability density function, f(x), then b for any interval (a, b), we have Pa  X  b   f xdx . The probability density function (p.d.f.) a has the following properties, which follow from Kolmogorov’s Axioms: 1) f x  0 everywhere;  2)  f xdx  1. 

Note: If X is a continuous r.v., then PX  x  0 for any x. (Think about this.)

Note: As a result, we have Pa  X  b  Pa  X  b  Pa  X  b  Pa  X  b .

Example: p. 59, Exercise 3-19

Defn: The cumulative distribution function (or c.d.f.) for a continuous r.v. X is given by x Fx  PX  x   f xdx , for all x   . If the distribution does not have a p.d.f., we may still  define the c.d.f. for any x as the probability that X takes on a value no greater than x.

Note: The c.d.f. for the distribution of a r.v. is unique, and completely describes the distribution.

Example: p. 59, Exercise 3-19.

Mean and Variance

Defn: The mean, or expected value, or expectation, of a continuous r.v. X with p.d.f. f(x) is given by    EX    xf xdx . 

Example: p. 59, Exercise 3-19.

Note: We interpret the mean in terms of relative frequency. If we were to repeated take a measurement of the random variable X, recording all of our measurements, and calculating the average after each measurement, the value of the average would approach a limit as we continued to take measurements, and this limit is the expectation of X.

Defn: Let X be a continuous r.v. with p.d.f. f(x), and mean . The variance of X, or the variance of  the distribution of X, is given by  2  V X   EX  2   x  2 f xdx . The standard  deviation of X is just the square root of the variance. 5

Note: In practice, it is easier to use the computational formula for the variance, rather than the defining formula:   2  EX 2   2   x 2 f xdx   2 . 

Example: p. 59, Exercise 3-19

k Defn: The kth moment of the distribution of X is E臌轾 X .

The Uniform Distribution

1 Consider a continuous r.v. X whose distribution has p.d.f. f x  , for a# x b , and b  a f( x) = 0 , otherwise. We say that X has a uniform distribution on the interval (a, b), abbreviated X ~ Uniform(a, b). If we take a measurement of X, we are equally likely to obtain any value within the interval. Hence, for some subinterval (c, d) ( a , b) , we have d 1 d- c P( c# x d) = dx = . c b- a b - a

b + b x1 轾 x2 a+ b The mean of the uniform distribution is m =蝌xf( x) dx = dx =犏 = , the - a b- a b - a 臌2a 2 midpoint of the interval (a, b).

The second moment of the distribution is + b 3 3 b- a b2 + ab + a 2 2 21 2 b- a ( )( ) E臌轾 X=蝌 x f( x) dx = x dx = = . - b- aa 3( b - a) 3( b - a) Then the variance is 2 b2+ ab + a 2 b 2 -2 ab + a 2 (b- a) s2=E轾 X 2 - m 2 = - = , and the standard deviation is 臌 3 4 12 b- a s = . 2 3

Note: the longer the interval (a, b), the larger the values of the variance and standard deviation. 6 The Normal Distribution

The normal distribution is a special type of bell-shaped curve.

Defn: A random variable X is said to be normally distributed or to have a normal distribution if its p.d.f has the form x 2  1 2 f x  e 2 , for -  < x < , -  <  < , and  > 0. 2 Here  and  are the parameters of the distribution;  = the mean of the random variable X (or of the probability distribution); and  = the standard deviation of X.

Note: The normal distribution is not just a single distribution, but rather a family of distributions; each member of the family is characterized by a particular pair of values of  and .

The graph of the p.d.f. has the following characteristics: 1) It is a bell-shaped curve; 2) It is symmetric about ; 3) The inflection points are at  -  and  + .

The normal distribution is very important in statistics for the following reasons: 1) Many phenomena occurring in nature or in industry have normal, or approximately normal, distributions. Examples: a) heights of people in the general population of adults; b) for a particular species of pine tree in a forest, the trunk diameter at a point 3 feet above the ground; c) fill weights of 12-oz. cans of Pepsi-Cola; d) IQ scores in the general population of adults; e) diameters of metal shafts used in disk drive units. 2) Under general conditions (independence of members of a sample), the possible values of the sample mean for samples of a given (large) size have an approximate normal distribution (Central Limit Theorem).

The Empirical Rule: For the normal distribution, 1) The probability that X will be found to have a value in the interval ( - ,  + ) is approximately 0.6827; 2) The probability that X will be found to have a value in the interval ( - 2,  + 2) is approximately 0.9545; 3) The probability that X will be found to have a value in the interval ( - 3,  + 3) is approximately 0.9973.

Unfortunately, the p.d.f. of the normal distribution does not have a closed-form anti-derivative. Probabilities must be calculated using numerical integration methods. This difficulty is the reason for the importance of a particular member of the family of normal distributions, the standard normal distribution, which has p.d.f. 7

z 2 1  f z  e 2 , for -�

Note: For shorthand, we will write X ~ Normal(, ) to mean that the continuous r.v. X has a normal distribution with mean  and standard deviation . The c.d.f. of the standard normal distribution will be denoted by 2 z 1 -w F( z) = P( Z� z) e2 dw . - 2p Values of this function have been tabulated in Table 1 of Appendix A. Examples: p. 64

The reason that the standard normal distribution is so important is that, if X ~ Normal(, ), then X - m Z = ~ Normal(0, 1). s

Example: p. 74, Exercise 3-38 a, b.

In statistical inference, we will have occasion to reverse the above procedure. Rather than finding the probability associated with a given interval, we will want to find the end point of an interval corresponding to a given tail probability for a normal distribution. I.e., we will want to find percentiles of the normal distribution, by inverting the distribution function (z).

Examples: a) Find the 90th percentile of the standard normal distribution. b) Find the 95th percentile of the standard normal distribution. c) Find the 97.5th percentile of the standard normal distribution.

Example: p. 74, Exercise 3-40 c.

Note: Mean and standard deviation of the Normal(, ) distribution. Although the p.d.f. cannot be integrated in closed form, the mean and variance may easily be found by integration.

Lognormal Distribution

Defn: We say that a continuous r.v. X has a lognormal distribution with parameters  and  if the natural logarithm of X has a normal distribution. The p.d.f. of X is 1 轾-(ln ( x) -q ) f( x) = exp 犏 2 , for 0 < x < , and f( x) = 0 , for x  0. xw2 p 臌犏 2w

1 2 q+ w 2 2q+ w2 w 2 The mean and variance of X are m =E[ X] = e 2 and s =V( X) = e( e -1) . These may easily be seen by using a change of variable and the results for the mean and variance of the normal distribution. The parameters  and 2 are the mean and variance of the r.v. W = ln(X). 8 We write X ~ lognormal(, ) to denote that X has a lognormal distribution with parameters  and .

Note: The c.d.f. for X is given by 骣ln( x) -q 骣 ln ( x) - q F( X) = P( X� x) � P( W �ln ( x)) F P琪 Z 琪 , 桫w 桫 w for x > 0, and F(X) = 0, for x  0. Hence, we may find probabilities associated with X by using Table 1 in Appendix A.

Note: This distribution is often applied to model the lifetimes of systems that degrade over time.

Example: p. 75, Exercise 3-50

Gamma Distribution

+ Defn: The gamma function is defined by the integral G(r) = tr-1 e - t dt , for r > 0. 0 It may be shown using integration by parts that G(r) =( r -1) G( r - 1) . Hence, in particular, if r is a positive integer, G(r) =( r -1) !. We also have G(0.5) = p .

Defn: A continuous r.v. X is said to have a gamma distribution with parameters r > 0 and  > 0 if the p.d.f. of X is r l r-1 -l x f( x) = x e , for x > 0, and f(x) = 0, for x  0. G(r) r r The mean and variance of X are given by m =E[ X ] = and s 2 =V( X ) = . l l 2 We write X ~ gamma(r, ) to denote that X has a gamma distribution with parameters r and .

It may be easily shown that the integral of the gamma p.d.f. over the interval (0, +) is 1, using the definition of the gamma function.

The gamma distribution is very important in statistical inference, both in its own right and because it is the basis for constructing some other distributions useful in inference. For example, the “signal-to- noise” ratio statistic that we will use in analyzing the results of scientific experiments is based on a ratio of random variables which have gamma distributions of a particular form. The graphs of some gamma p.d.f.’s are shown on p. 72.

Defn: A continuous r.v. X is said to have a chi-squared distribution with k degrees of freedom if X ~ gamma(k, 0.5).

Weibull Distribution

Defn: A continuous r.v. X is said to have a Weibull distribution with parameters  > 0 and  > 0 if the p.d.f. of X is 9

b 骣xb-1 轾 骣 x b f( x) =琪exp 犏 - 琪 , for x > 0, and f( x) = 0 , for x  0. The mean and variance of X d桫 d臌犏 桫 d

骣 1 2 2骣 2 2 are m=E[ X ] = d G琪1 + and s=V( X ) = d G琪1 + - m . We write X ~ Weibull(, ). 桫 b 桫 b

b 轾骣x The c.d.f. for a Weibull(, ) distribution is given by F( x) =1 - exp 犏 -琪 , for x > 0, and 臌犏桫d F(x) = 0, for x  0.

The Weibull distribution is used to model the reliability of many different types of physical systems. Different combinations of values of the two parameters lead to models with either a) increasing failure rates over time, b) decreasing failure rates over time, or c) constant failure rates over time.

Example: p. 75, Exercise 3-53.