<<

Part V - Chance Variability

Dr. Joseph Brennan

Math 148, BU

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 1 / 78 Law of Averages

In Chapter 13 we discussed the Kerrich coin-tossing experiment. Kerrich was a South African who spent World War II as a Nazi prisoner. He spent his time flipping a coin 10, 000 times, faithfully recording the results.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 2 / 78 Law of Averages

Law of Averages: If an experiment is independently repeated a large number of times, the percentage of occurrences of a specific event E will be the theoretical of the event occurring, but of by some amount - the chance error.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 3 / 78 Law of Averages

As the coin toss was repeated, the percentage of heads approaches its theoretical expectation: 50%.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 4 / 78 Law of Averages

Caution

The Law of Averages is commonly misunderstood as the Gamblers Fallacy:

”By some magic everything will balance out. With a run of 10 heads a tail is becoming more likely.”

This is very false. After a run of 10 heads the probability of tossing a tail is still 50%!

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 5 / 78 Law of Averages

In fact, the number of heads above half is quickly increasing as the experiment proceeds. A gambler betting on tails and hoping for balance would be devastated as the tails appear about 134 times less than heads after 10, 000 tosses.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 6 / 78 Law of Averages

In our coin-flipping experiment; the number of heads will be around half the number of tosses plus or minus the chance error.

As the number of tosses goes up, the chance error gets larger in absolute terms. However, when viewed relatively, the chance error as a percentage decreases. Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 7 / 78 Sample Spaces

Recall that a sample spaces S lists all the possible outcomes of a study. Example (3 coins): We can record an as a string of heads and tails, such as HHT. The corresponding sample space is S = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}. It is often more convenient to deal with outcomes as numbers, rather than as verbal statements. Suppose we are interested in the number of heads. Let X denote the number of heads in3 tosses. For instance, if the outcome is HHT, then X = 2. The possible values of X are0,1,2, and3. For every outcome from S, X will take a particular value: Outcome HHH HHT HTH THH TTH THT HTT TTT X 3 2 2 2 1 1 1 0

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 8 / 78 Random Variable

Random Variable: An unknown subject to random change. Often a random variable will be an unknown numerical result of study. A random variable has a numerical sample space where each outcome has an assigned probability. There is not necessarily equal assigned : The quantity X in the previous Example is a random variable because its value is unknown unless the tossing experiment is performed.

Definition: A random variable is an unknown numerical result of a study.

Mathematically, a random variable is a function which assigns a numerical value to each outcome in a sample space S.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 9 / 78 Example (3 coins)

We have two different sample spaces for our 3 coin experiment: S = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}. S∗ = {0, 1, 2, 3} The sample spaceS describes8 equally likely outcomes for our coin flips while the sample space S∗ describes4 not equally likely outcomes. Recall that S∗ represents the values of the random variableX, the number of heads resulting from three coin flips. 1 1 1 1 P(X = 0)= P(TTT ) = · · = 2 2 2 8 3 P(X = 1)= P(HTT or TTH or THT ) = 8 3 1 P(X = 2)= P(X = 3)= 8 8 S∗ does not contain information about the order of heads and tails. Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 10 / 78 Discrete and Continuous Random Variables Discrete Random Variables: A discrete random variable has a number of possible values which can be listed. Mathematically we say the number of possible values are countable. Variable X in Example (3 coins) is discrete. Simple actions are discrete: rolling dice, flipping coins, dealing cards, drawing names from a hat, spinning a wheel, . . . Continuous Random Variables: A continuous random variable takes values in an interval of numbers. It is impossible to list or count all the possible values of a continuous random variable. Mathematically we say the number of possible values are uncountable. For the data on heights of people, the average height¯x is a continuous random variable which takes on values from some interval, say, [0, 200] (in inches).

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 11 / 78 Probability Distributions

Any random variable X , discrete or continuous, can be described with A probability distribution. A mean and standard deviation.

The probability distribution of a random variable X is defined by specifying the possible values of X and their probabilities. For discrete random variables the probability distribution is given by the probability table and is represented graphically as the probability histogram. For continuous random variables the probability distribution is given by the probability density function and is represented graphically by the density curve. Recall that we discussed density curves in Part II.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 12 / 78 The Mean of a Random Variable X

In Part II (Descriptive Statistics) we discussed the mean and standard deviation,¯x and s, of data sets to measure the center and spread of the observations. Similar definitions exist for random variables: The mean of the random variable X , denoted µ, measures the centrality of the probability distribution.

The mean µ is computed from the probability distribution of X as a weighted average of the possible values of X with weights being the probabilities of these values.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 13 / 78 The

The mean µ of a random variable X is often called the expected value of X . It means that the observed value of a random variable is expected to be around its expected value; the difference is the chance error. In other words,

observed value of X = µ + chance error

We never expect a random variable X to be exactly equal to its expected value µ. The likely size of the chance error can be determined by the standard deviation, denoted σ. The standard deviation σ measures the distribution’s spread and is a quantity which is computed from the probability distribution of X .

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 14 / 78 Random Variable X and Population

A population of interest is often characterized by the random variable X . Example: Suppose we are interested in the distribution of American heights. The random variable X (height) describes the population (US people).

The distribution of X is called the population distribution, and the distribution parameters, µ and σ, are the population parameters.

Population parameters are fixed constants which are usually unknown and need to be estimated. A sample (data set) should be viewed as values (realizations) of the random variable X drawn from the probability distribution. The sample mean¯x and standard deviation s estimates the unknown population mean µ standard deviation σ.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 15 / 78 Discrete Random Variables

The distribution of a discrete random variable X is summarized in the distribution table:

Value of X x1 x2 x3 ... xk Probability p1 p2 p3 ... pk

The symbols xi represent the distinct possible values of X and pi is the probability associated to xi .

p1 + p2 + ... + pk = 1 (or 100%)

This is due to all possible values of X being listed in the sample space S = {x1, x2,..., xk }.

The events X = xi and X = xj , i 6= j, are disjoint since the random variable X cannot take two distinct values at the same time.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 16 / 78 Example (Fish)

A resort on a lake claims that the distribution of the number of fish X in the daily catch of experienced fisherman is given below.

x 0 1 2 3 4 5 6 7 P(X = x) 0.02 0.08 0.10 0.18 0.25 0.20 0.15 0.02 Find the following : (a) P(X ≥ 5) 0.37 (b) P(2 < X < 5) 0.43 (c) y if P(X ≤ y) = 0.2 y = 2 (d) y if P(X > y) = 0.37 y = 4 (e) P(X 6= 5) 1 − 0.20 = 0.80 (f) P(X < 2 or X = 6) 0.25 (g) P(X < 2 and X > 4) 0 (h) P(X = 9) 0

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 17 / 78 Probability Histograms

The probability distribution of a random variable X is called the probability histogram. There are k bars, where k is the number of possible values of X .

The i-th bar is centered at the xi , has a unit width and height pi . The areas of the probability histograms display the assignment of probabilities to possible values of X . Example (3 coins) The distribution table for X , the number of heads after 3 coin flips, is given below: X 0 1 2 3

1 3 3 1 P(X ) 8 8 8 8

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 18 / 78 Example (3 coins): The Probability Histogram.

The probability histogram for the 3 coins example is shown below.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 19 / 78 Probability Histograms and Data Histograms

Do not confuse the probability histogram and the data (empirical) histogram!

The probability histogram is a theoretical histogram which shows the probabilities of possible outcomes. Each bar on the probability histogram shows the probability of a certain outcome.

The data histogram is an empirical histogram which shows the distribution of observed outcomes. Each bar on the data histogram represents the observed frequency of that outcome.

As the probability is a long run frequency, we should think of the probability histograms as idealized pictures of the results of very many trials.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 20 / 78 Example (Two Dice)

Two dice are rolled. Find the distribution of the total and plot its probability histogram. Solution: Let X denote the sum on the two dice. There are 11 possible values of X . Value of X 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 5 4 3 2 1 Probability 36 36 36 36 36 36 36 36 36 36 36

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 21 / 78 Example (Two dice)

A computer simulated throwing a pair of dice, and the experiment was repeated 100 times, 1000 times and then 10, 000 times. The empirical histograms for the sums are plotted below:

We can see that the empirical histogram converges (gets closer and closer) to the probability histogram as the number of repetitions increases.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 22 / 78 Discrete Random Variable: µ and σ

Mean: The mean µ of a discrete random variable is found by multiplying each possible value by its probability and adding together all the products:

k X µ = x1p1 + x2p2 + ... + xk pk = xi pi i=1

Standard Deviation: The standard deviation σ of a discrete random variable is found with the aid of µ: q 2 2 2 σ = (x1 − µ) p1 + (x2 − µ) p2 + ... (xk − µ) pk v u k uX 2 = t (xi − µ) pi i=1

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 23 / 78 Example (Two dice): µ and σ

Two dice are rolled. The distribution table for this random event: Value of X 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 5 4 3 2 1 Probability A 36 36 36 36 36 36 36 36 36 36 36 The mean: 1 2 3 1 µ = 2 · + 3 · + 3 · + ... + 12 · =7 . 36 36 36 36 This shouldn’t be too much of a surprise as we’ve seen in class that the mean for rolling one die is3 .5. The standard deviation: r 1 2 1 σ = (2 − 7)2 · + (3 − 7)2 · + ... + (12 − 7)2 · ≈ 2.415. 36 36 36

Interpretation: If an experiment is repeated many-many times, and the average of outcomes,¯x, is computed, it is expected to be close to7. An interpretation of the standard deviation is not so clear. Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 24 / 78 Box Models

Many statistical questions can be framed in terms of drawing tickets from a box. Box Model: A model framing a statistical question as drawing tickets (with or without replacement) from a box. The tickets are to be labeled with numerical values linked to a random variable. Example: Suppose we are to flip one fair coin, we would be able to model the possible outcomes in terms of drawing from a box: There are two tickets in the box. The first is labeled1 and the second is labeled0. Flipping a head is equivalent to drawing a1 from the box and a tail is equivalent to drawing a2. If we were to flip the coin multiple times, we would be drawing multiple tickets from the box. As coin flips are independent, we draw from the box with replacement.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 25 / 78 Box Models

A Box Model is a version of a Distribution Table for a random variable. They allow one to simplify a question to an easily visualized experiment (a common theme in mathematics... a large number of questions can be framed as manipulating objects found in a box). A box model should be used when a question requires classifying and counting. If you are interested in a certain subset of values for a random variable, label the tickets corresponding to your interests as1 and the remaining tickets as0. Example: If you are interested in the occurrence of 3 or 4 when rolling a die your box model

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 26 / 78 Box Models

When we wish to describe the expected value and standard deviation for a box model, we use the formulas for discrete random variables, but we have a simpler way to visualize these ideas. The expected value of a random variable is the average of the tickets occupying the box model. The standard deviation of a random variable is the standard deviation of the tickets.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 27 / 78 The Sum of n Independent Outcomes

Since individual outcomes of an experiment are values of a random variable X , the sum of multiple outcomes will also be a random variable. What are the mean and standard deviation of this variable? We will use the following notation: n is the number of repetitions of an experiment. µ and σ are the mean and standard deviation of the random variable X which describes a single outcome of an experiment.

In terms of a box model,n is the number of draws that we will draw from our box. Because we want Independent Events, drawing from the box is with replacement.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 28 / 78 The Sum of n Independent Outcomes

When the same experiment is repeated independently n times, the following is true for the sum of outcomes: The expected value of the sum of n independent outcomes of an experiment: nµ The standard error of the sum of n independent outcomes of an experiment: √ nσ The second part of the above rule is called the the Square Root Law. Note that the above rule is true for any sequence of independent random variables, discrete or continuous!

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 29 / 78 Example (Test)

Example 4 (Test) A test has 20 multiple choice questions. Each question has5 possible answers, one of which is correct. A correct answer is worth5 points, so the total possible score is 100. A student answers all questions by guessing at random. What is the expected value and standard deviation of their total score? Solution: Let X be a number of points earned on one question. Then X is a random variable which has the following distribution:

Value of X 0 5 4 1 Probability A 5 5

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 30 / 78 Example (Test)

Value of X 0 5 4 1 Probability A 5 5 The mean of X is 4 1 µ = 0 · + 5 · =1 5 5

The standard deviation of X is r 4 1 σ = (0 − 1)2 · + (5 − 1)2 · =2 5 5

We are interested in the mean and standard error of the sum of scores from 20 questions. The questions are independent. We have: The mean of the sum = 20 · 1 = 20 points, √ The standard error of the sum = 20 · 2 ≈ 8.94 points.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 31 / 78 A Computational Trick!

When there are just two numbers, x1 and x2, in the distribution of X the distribution’s standard deviation, σ, can be computed by using the following short-cut formula: √ σ = |x1 − x2| p1p2

where pi is the probability of xi . Example (Test) The standard deviation for the distribution of points earned by guessing on one question can be easily found as

r4 1 2 σ = |0 − 5| · = 5 · =2 5 5 5 which coincides with what we found before.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 32 / 78 A Computational Trick!

This trick will be used often when we are interested in classifying and counting. These problems are framed as a box model with tickets being either 0 or 1. σ = p (fraction of 1’s) × (fraction of 0’s)

r2 4 σ = × = 0.47 6 6

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 33 / 78 Standard Error

An observed value differs from the expected value by the chance error. The likely size of the chance error is given by the standard error.

The sum of the points earned from randomly selecting answers on our 20 question test is expected to be 20 give or take the standard error of 8.94 points.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 34 / 78 The Binomial Setting

1 There are a fixed number of n of repeated trials. 2 The trials are independent. In other words, the outcome of any particular trial is not influenced by previous outcomes. 3 The outcome of every trial falls into one of just two categories, which for convenience we call success and failure. 4 The probability of a success, call it p, is the same for each trial. 5 It is the total number of successes that is of interest, not their order of occurrence.

NOTE: The Binomial Setting can be framed as a box model with only 1’s and 0’s where draws are performed with replacement.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 35 / 78 Binomial Setting

The binomial setting is appropriate under the sampling WITH replacement scheme. When sampling WITHOUT replacement, removing objects from the population changes the probability of success for the next trial and introduces dependence between the trials.

However, when the population is large enough: Removing a few items from it doesn’t change the proportion of successes and failures significantly. Successive trials are nearly independent.

Conclusion:: We can apply the binomial setting for sampling without replacement problems when the population is large.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 36 / 78 Binomial Coefficients

The number of ways in which exactly k successes can occur in the n trials of a binomial experiment can be found as n n! = , k k!(n − k)!

n where n! = 1 · 2 · 3 · ... n. The exclamation mark is read factorial and k is read n choose k.

Example: Students are given a list of nine books and told that they will be examined on the contents of five of them. How many combinations of five books are possible?

9 9! = = 126. 5 5!4!

There are 126 possible combinations of5 books out of9 books.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 37 / 78 The Binomial Distribution

Let X denote the number of successes under the binomial setting. Then X is a random variable which may take values0 , 1, 2, 3, ..., n. In particular, X = 0 means no successes in n trails. Only failures were observed. X = n means the outcomes of all n trails are successes. X = 5 means5 successes in n trials. It turns out that X has a special discrete distribution which is called the binomial distribution. The probabilities of values of X are computed as

n P(X = k)= pk (1 − p)n−k , k = 0, 1, 2,..., n. (1) k

So the binomial distribution is a probability distribution of a random variable X which has2 parameters: p (probability of success) and n (the number of trials).

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 38 / 78 Binomial Mean and Standard Deviation

Let X be a binomial random variable with parameters n (number of trials) and p (probability of success in each trial). Then the mean and standard deviation of X are µ = np,

σ = pnp(1 − p).

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 39 / 78 Example (Heart Attack)

The Helsinki Heart Study asked whether the anticholesterol drug gemfibrozil reduces heart attacks. The Helsinki study planned to give gemfibrozil to about 2000 men aged 40 to 55 and a placebo to another 2000. The probability of a heart attack during the five-year period of the study for men this age is about 0.04. What are the mean and standard deviation of the number of heart attacks that will be observed in one group if the treatment does not change this probability?

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 40 / 78 Example (Heart Attack) Solution:

There are 2000 independent observations, each having probability p = 0.04 of a heart attack. The count X of heart attacks has binomial distribution. µ = np = 2000 · 0.04 = 80,

σ = pnp(1 − p) = p2000 · 0.04 · (1 − 0.04) ≈ 8.76.

In fact, there were 84 heart attacks among the 2035 men actually assigned to the placebo, quite close to the mean. The gemfibrozil group of 2046 men suffered only 56 heart attacks. This is evidence that the drug does reduce the chance of a heart attack.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 41 / 78 Example (Light Bulbs)

For a lot of 1,000,000 light bulbs the probability of a defective bulb is 0.01. What is the probability that there are 20,000 defective bulbs in a lot? Solution : There are n = 1, 000, 000 bulbs (trials). The probability of a defect (success) for each bulb is p = 0.01. Let X be a number of defective bulbs out of n = 1, 000, 000. Then X has a binomial distribution. The expected value of X is

µ = np = 1, 000, 000 · 0.01 = 10, 000.

The standard deviation of X is √ σ = pnp(1 − p) = p1, 000, 000 · 0.01 · 0.99 = 9900 ≈ 99.5.

We want to compute the probability that X = 20, 000.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 42 / 78 Example (Light Bulbs)

We have 1, 000, 000 P(X = 20, 000)= 0.0120,000(1 − 0.01)1,000,000−20,000 20, 000 1, 000, 000! = 0.0120,0000.99980,000. 20, 000! · 980, 000! If you try to compute1 , 000, 000!, it may crash you computer! How can we compute the desired probability then? We may approximate it using the normal approximation to the binomial distribution.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 43 / 78 Normal Approximation to the Binomial Distribution

Consider the probability histograms for binomial distributions with different values of n and p.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 44 / 78 Normal Approximation to the Binomial Distribution

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 45 / 78 Normal Approximation to the Binomial Distribution

We can see that some of the probability histograms are bell-shaped. This suggests that the binomial distribution may be approximated by the normal distribution for certain combinations of n and p.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 46 / 78 Normal Approximation to the Binomial Distribution

In particular, observe the following: For a fixed p, the larger the sample size n, the better the normal approximation to the binomial distribution. For a fixed n, the closer p to 0.5, the better the normal approximation to the binomial distribution.

NORMAL APPROXIMATION for BINOMIAL COUNTS

Let X be a random variable which has a binomial distribution with parameters n and p. When n is large, the distribution of X is approximately normal. X is approximately normal with mean np and standard deviation pnp(1 − p). As a rule, we will use this approximation for values of n and p that satisfy np ≥ 10 and n(1 − p) ≥ 10. Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 47 / 78 Normal Approximation to the Binomial Distribution

A few remarks are in order : The above normal approximation is easy to remember because it says that X is approximately normal with its usual mean and standard deviation. The true distribution of X is binomial, not normal. The normal distribution is just a good approximation of the binomial probability histogram when the conditions in the rule are satisfied. The normal approximation to the binomial distribution consists in replacing the actual probability histogram with the normal curve before computing areas.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 48 / 78 Example (College Commute)

A recent survey on a college campus revealed that 40% of the students live at home and commute to college. If a random sample of 320 students is questioned, what is the probability of finding at least 130 students who live at home? Solution : Let X be a count of students who live at home in a sample of size n = 320. Then X has binomial distribution with parameters n = 320 and p = 0.4. We need to compute

P(X ≥ 130)= P(X = 130) + P(X = 131) + ... + P(X = 320) 320 320 320 = 0.41300.6190 + 0.41310.6189 + ... 0.43200.60. 130 131 320

The above computation is cumbersome. Can we use the normal approximation to the binomial distribution to compute P(X ≥ 130)?

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 49 / 78 Example (College Commute)

Check the conditions of the rule:

np = 320 · 0.4 = 128, n(1 − p) = 320(1 − 0.4) = 192

Since both np and n(1 − p) are greater than 10, we can use the normal approximation to the binomial distribution. √ √ pnp(1 − p)= 320 · 0.4 · 0.6 = 76.8 ≈ 8.76.

Then X is approximately normal with the mean = np = 129 and SD = pnp(1 − p) ≈ 8.76. How good is the normal approximation to the binomial distribution?

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 50 / 78 Example (College Commute)

The figure below displays the probability histogram of the binomial distribution (bar graph) with the density curve of the approximating normal distribution superimposed.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 51 / 78 Example (College Commute)

Both distributions have the same mean and standard deviation, and both the area under the histogram and the area under the curve are 1. The normal curve fits the histogram very well.

The normal approximation to the probability of at least 130 students is the area under the normal curve to the right of 130 − np 130 − 128 z = = ≈ 0.228. pnp(1 − p) 8.76

100% − 19.74% P(X ≥ 130) ≈ P(Z ≥ 0.25) = = 40.13% (or0 .4013). 2

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 52 / 78 Example (College Commute)

The actual binomial probability that there are at least 130 students who live at home can be computed:

P(X ≥ 130)= P(X = 130) + P(X = 131) + ... + P(X = 320) = 0.4306.

The above probability is the area under the binomial probability histogram to the right of the value x = 130. Note that the actual and approximate probabilities are quite close. The normal approximation works well in this case.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 53 / 78 Example (Light Bulbs)

Recall that for a lot of 1,000,000 bulbs the probability of a defective bulb is 0.01. We want to find the probability that there are 20,000 defective bulbs in the lot. We justified that X , a count of defective bulbs in a lot, is a binomial random variable with parameters p = 0.01 and n = 1, 000, 000. We also concluded that we will not handle the direct binomial computation

1, 000, 000 P(X = 20, 000) = 0.0120,000(1 − 0.01)1,000,000−20,000, 20, 000

since computing factorials of large numbers is simply not feasible! The probability histogram for the binomial distribution with parameters p = 0.01 and n = 1, 000, 000 has 1,000,001 bars centered at the values 0, 1, 2, ..., 1000000.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 54 / 78 Example (Light Bulbs)

The chance that X = 20, 000 is the area of the bar over 20,000. We want to use the normal distribution to approximate the area of this rectangle. The base of this rectangle goes from 19999.5 to 20000.5 on the number of defective details scale. In standard units the base of the rectangle goes 19999.5−10,000 20000.5−10000 from z1 = 99.5 ≈ 100.497 to z2 = 99.5 ≈ 100.508. Then P(X = 20, 000) ≈ P(100.497 ≤ Z ≤ 100.508) ≈ 0.

There is almost no chance that the lot will contain EXACTLY 20,000 defective light bulbs. We can expect the normal approximation to be quite accurate in this example since np = 1, 000, 000 · 0.01 = 10, 000 > 10.

and n(1 − p)=1 , 000, 000 · 0.99 = 990, 000 > 10.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 55 / 78 Continuous Random Variables

Continuous random variables take values from the intervals on the number line.

Examples of continuous random variables: Weight; Height; Volume.

The probability distributions of continuous random variables are given by probability density functions p(x) which are displayed graphically as density curves. Most density curves are smooth curves without sharp edges.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 56 / 78 Probabilities of Events for Continuous Distributions

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 57 / 78 More about Continuous Random Variables

For any continuous distribution: The total area under the density curve is1. The probability density is a non-negative function: p(x) ≥ 0. The probability of any event is the area under the density curve and above the values of X that make up the event. The probability that X having a continuous distribution takes any particular value x is zero :

P(X = x) = 0.

Explanation: X takes infinitely many values, so the probability that X takes any particular value is 0. A continuous random variable assigns probabilities to intervals of outcomes rather than to individual outcomes.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 58 / 78 Important Continuous Distributions

There are many important continuous distributions including: Normal distribution, Chi-square distribution (often written χ2 distribution), t-distribution, F -distribution, Exponential distribution, Uniform distribution, Weibull distribution.

In this unit we will discuss just the normal distribution. This course will also discuss the Chi-square distribution and the t-distribution.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 59 / 78 The Normal Distribution

We have used the standard normal curve for computations of chances or percents of observations many times in the past.

Figure : The standard normal curve.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 60 / 78 Parameters of the Normal Distribution

Many random variables, such as height, weight, reaction time to a medication, scores on the IQ test, have distributions of the bell-shaped type which can be reasonably approximated by a normal curve. The normal distribution should be viewed as a convenient model for many random variables. There is the whole family of normal distributions, not just the standard normal distribution with µ = 0 and σ = 1 that appears in the normal table. Different members of a normal family have different values of the parameters, µ and σ. All normal distributions have the same overall bell shape. Parameters µ and σ transform the bell as follows: The value of µ determines the centering of the distribution. Changing µ merely translates the curve to the right or the left. The value of σ determines the spread of the bell. Larger values of σ correspond to greater spread of the curve.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 61 / 78 The Normal Density

We express the fact that the random variable X has the normal distribution with parameters µ and σ in the following way:

X ∼ N(µ, σ2).

The parameters of the normal distribution play the following role: µ is the mean of the distribution (of X ), σ is the standard deviation of the distribution (of X ).

Special case: The normal table gives the probabilities P(−∞ < Z < z), where Z ∼ N(0, 1).

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 62 / 78 The Role of σ

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 63 / 78 Several Normal Distributions

Several normal distributions with different means and standard deviations are shown on the plot below.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 64 / 78 Normal Distribution: Range of Possible Values

Even though the normal density curve is defined on the whole number line, the probability that X will fall outside of the interval µ ± 3σ is very small (0.27%).

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 65 / 78 The Standard Normal Distribution

Let X ∼ N(µ, σ2).

X −µ The new random variable Z = σ has the standard normal distribution. As in the mean is0 and the standard deviation is1.

For practice, this means that all the probability computations for normal distributions may be performed just using the standard normal distribution.

P(x1 < X < x2) = P(z1 < Z < z2)

where z1 and z2 are computed by x − µ x − µ z = 1 and z = 2 . 1 σ 2 σ

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 66 / 78 Universal Use of the Standard Normal Distribution.

The relationship between the areas involved is shown below :

Thus, we only need the standard normal table to compute probabilities of events for ALL normal distributions.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 67 / 78 Percentages of Observations and Probabilities.

In Chapter 5 we used the normal curve to approximate the data’s histograms. We computed the area under the normal curve to approximate percentage of observations which fall into the corresponding interval on the data’s histogram.

Now we are using the normal curve to compute the probability that a normally distributed random variable X will take values from a particular interval.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 68 / 78 Caution

When we are in the Binomial Setting we can use our rule to know if the distribution approximates a normal curve. This holds specifically because we are summing the draws from our box. It is not true in general that all operations associated with drawing from a box will eventually normalize.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 69 / 78 Product Histograms

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 70 / 78 Product Histograms

If we look at the probability histograms resulting from multiple single die rolls which are multiplied we do not tend towards a nice bell shape:

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 71 / 78 The Central Limit Theorem (CLT)

The Central Limit Theorem: When drawing at random with replacement from a box, the probability histogram for the sum will approximately follow the normal curve, even if the contents of the box do not. The larger the number of draws, the better the normal approximation.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 72 / 78 The Central Limit Theorem (CLT)

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 73 / 78 The Central Limit Theorem (CLT)

The sample size n should be at least 30( n ≥ 30) before the normal approximation can be used.

For symmetric population distributions the distribution of¯x is usually normal-like even at n = 10 or more. For very skewed populations distributions larger values of n may be needed to overcome the skewness.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 74 / 78 The Central Limit Theorem (CLT) at Work

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 75 / 78 Distribution of a Sum

The following result is a consequence from the CLT. Suppose that: We repeat the same experiment n times, The outcomes of repeated experiments are independent, Every outcome (described with the random variable X ) has the mean µ and standard deviation σ.

When n is large enough (n ≥ 30), the distribution of the SUM OF OUTCOMES x1 + x2 + ... + xn is approximately  √  N nµ, nσ2 , √ which means a normal distribution with mean nµ and SD nσ. If the distribution of X is normal, the above result holds exactly.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 76 / 78 Example (Airline Passengers)

In response to the increasing weight of airline passengers, the Federal Aviation Administration in 2003 told airlines to assume that passengers average 190 pounds in summer, including clothing and carry-on baggage. But passengers vary! A reasonable standard deviation is 35 pounds. Assume that the weights of airline passengers are normally distributed. Question: A commuter plane carries 19 passengers. What is the probability that the total weight of the passengers exceeds 4000 pounds? Solution: We have n = 19 passengers. The mean weight of a passenger is µ = 190, and the standard deviation is σ = 35. Let x1, x2,..., x19 denote the passengers’ weights.

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 77 / 78 Example (Airline Passengers)

We want to find the probability that the sum of weights,

x1 + x2 + ... + x19, exceeds 4000 pounds. Since the distribution of X (weight of an airline passenger) is normal, the distribution of the sum is normal with: √ mean = 19 · 190= 3610 and SD = 1935 ≈ 152.56. Computing the z - score for 4000: 4000 − 3610 z = ≈ 2.56 152.56 We have: 19 ! X P xi > 4000 = P(Z > 2.56)= 100% − 99.48% =0 .52%. i=1

Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 78 / 78