Part V - Chance Variability
Dr. Joseph Brennan
Math 148, BU
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 1 / 78 Law of Averages
In Chapter 13 we discussed the Kerrich coin-tossing experiment. Kerrich was a South African who spent World War II as a Nazi prisoner. He spent his time flipping a coin 10, 000 times, faithfully recording the results.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 2 / 78 Law of Averages
Law of Averages: If an experiment is independently repeated a large number of times, the percentage of occurrences of a specific event E will be the theoretical probability of the event occurring, but of by some amount - the chance error.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 3 / 78 Law of Averages
As the coin toss was repeated, the percentage of heads approaches its theoretical expectation: 50%.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 4 / 78 Law of Averages
Caution
The Law of Averages is commonly misunderstood as the Gamblers Fallacy:
”By some magic everything will balance out. With a run of 10 heads a tail is becoming more likely.”
This is very false. After a run of 10 heads the probability of tossing a tail is still 50%!
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 5 / 78 Law of Averages
In fact, the number of heads above half is quickly increasing as the experiment proceeds. A gambler betting on tails and hoping for balance would be devastated as the tails appear about 134 times less than heads after 10, 000 tosses.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 6 / 78 Law of Averages
In our coin-flipping experiment; the number of heads will be around half the number of tosses plus or minus the chance error.
As the number of tosses goes up, the chance error gets larger in absolute terms. However, when viewed relatively, the chance error as a percentage decreases. Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 7 / 78 Sample Spaces
Recall that a sample spaces S lists all the possible outcomes of a study. Example (3 coins): We can record an outcome as a string of heads and tails, such as HHT. The corresponding sample space is S = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}. It is often more convenient to deal with outcomes as numbers, rather than as verbal statements. Suppose we are interested in the number of heads. Let X denote the number of heads in3 tosses. For instance, if the outcome is HHT, then X = 2. The possible values of X are0,1,2, and3. For every outcome from S, X will take a particular value: Outcome HHH HHT HTH THH TTH THT HTT TTT X 3 2 2 2 1 1 1 0
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 8 / 78 Random Variable
Random Variable: An unknown subject to random change. Often a random variable will be an unknown numerical result of study. A random variable has a numerical sample space where each outcome has an assigned probability. There is not necessarily equal assigned probabilities: The quantity X in the previous Example is a random variable because its value is unknown unless the tossing experiment is performed.
Definition: A random variable is an unknown numerical result of a study.
Mathematically, a random variable is a function which assigns a numerical value to each outcome in a sample space S.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 9 / 78 Example (3 coins)
We have two different sample spaces for our 3 coin experiment: S = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}. S∗ = {0, 1, 2, 3} The sample spaceS describes8 equally likely outcomes for our coin flips while the sample space S∗ describes4 not equally likely outcomes. Recall that S∗ represents the values of the random variableX, the number of heads resulting from three coin flips. 1 1 1 1 P(X = 0)= P(TTT ) = · · = 2 2 2 8 3 P(X = 1)= P(HTT or TTH or THT ) = 8 3 1 P(X = 2)= P(X = 3)= 8 8 S∗ does not contain information about the order of heads and tails. Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 10 / 78 Discrete and Continuous Random Variables Discrete Random Variables: A discrete random variable has a number of possible values which can be listed. Mathematically we say the number of possible values are countable. Variable X in Example (3 coins) is discrete. Simple actions are discrete: rolling dice, flipping coins, dealing cards, drawing names from a hat, spinning a wheel, . . . Continuous Random Variables: A continuous random variable takes values in an interval of numbers. It is impossible to list or count all the possible values of a continuous random variable. Mathematically we say the number of possible values are uncountable. For the data on heights of people, the average height¯x is a continuous random variable which takes on values from some interval, say, [0, 200] (in inches).
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 11 / 78 Probability Distributions
Any random variable X , discrete or continuous, can be described with A probability distribution. A mean and standard deviation.
The probability distribution of a random variable X is defined by specifying the possible values of X and their probabilities. For discrete random variables the probability distribution is given by the probability table and is represented graphically as the probability histogram. For continuous random variables the probability distribution is given by the probability density function and is represented graphically by the density curve. Recall that we discussed density curves in Part II.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 12 / 78 The Mean of a Random Variable X
In Part II (Descriptive Statistics) we discussed the mean and standard deviation,¯x and s, of data sets to measure the center and spread of the observations. Similar definitions exist for random variables: The mean of the random variable X , denoted µ, measures the centrality of the probability distribution.
The mean µ is computed from the probability distribution of X as a weighted average of the possible values of X with weights being the probabilities of these values.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 13 / 78 The Expected Value
The mean µ of a random variable X is often called the expected value of X . It means that the observed value of a random variable is expected to be around its expected value; the difference is the chance error. In other words,
observed value of X = µ + chance error
We never expect a random variable X to be exactly equal to its expected value µ. The likely size of the chance error can be determined by the standard deviation, denoted σ. The standard deviation σ measures the distribution’s spread and is a quantity which is computed from the probability distribution of X .
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 14 / 78 Random Variable X and Population
A population of interest is often characterized by the random variable X . Example: Suppose we are interested in the distribution of American heights. The random variable X (height) describes the population (US people).
The distribution of X is called the population distribution, and the distribution parameters, µ and σ, are the population parameters.
Population parameters are fixed constants which are usually unknown and need to be estimated. A sample (data set) should be viewed as values (realizations) of the random variable X drawn from the probability distribution. The sample mean¯x and standard deviation s estimates the unknown population mean µ standard deviation σ.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 15 / 78 Discrete Random Variables
The distribution of a discrete random variable X is summarized in the distribution table:
Value of X x1 x2 x3 ... xk Probability p1 p2 p3 ... pk
The symbols xi represent the distinct possible values of X and pi is the probability associated to xi .
p1 + p2 + ... + pk = 1 (or 100%)
This is due to all possible values of X being listed in the sample space S = {x1, x2,..., xk }.
The events X = xi and X = xj , i 6= j, are disjoint since the random variable X cannot take two distinct values at the same time.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 16 / 78 Example (Fish)
A resort on a lake claims that the distribution of the number of fish X in the daily catch of experienced fisherman is given below.
x 0 1 2 3 4 5 6 7 P(X = x) 0.02 0.08 0.10 0.18 0.25 0.20 0.15 0.02 Find the following : (a) P(X ≥ 5) 0.37 (b) P(2 < X < 5) 0.43 (c) y if P(X ≤ y) = 0.2 y = 2 (d) y if P(X > y) = 0.37 y = 4 (e) P(X 6= 5) 1 − 0.20 = 0.80 (f) P(X < 2 or X = 6) 0.25 (g) P(X < 2 and X > 4) 0 (h) P(X = 9) 0
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 17 / 78 Probability Histograms
The probability distribution of a random variable X is called the probability histogram. There are k bars, where k is the number of possible values of X .
The i-th bar is centered at the xi , has a unit width and height pi . The areas of the probability histograms display the assignment of probabilities to possible values of X . Example (3 coins) The distribution table for X , the number of heads after 3 coin flips, is given below: X 0 1 2 3
1 3 3 1 P(X ) 8 8 8 8
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 18 / 78 Example (3 coins): The Probability Histogram.
The probability histogram for the 3 coins example is shown below.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 19 / 78 Probability Histograms and Data Histograms
Do not confuse the probability histogram and the data (empirical) histogram!
The probability histogram is a theoretical histogram which shows the probabilities of possible outcomes. Each bar on the probability histogram shows the probability of a certain outcome.
The data histogram is an empirical histogram which shows the distribution of observed outcomes. Each bar on the data histogram represents the observed frequency of that outcome.
As the probability is a long run frequency, we should think of the probability histograms as idealized pictures of the results of very many trials.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 20 / 78 Example (Two Dice)
Two dice are rolled. Find the distribution of the total and plot its probability histogram. Solution: Let X denote the sum on the two dice. There are 11 possible values of X . Value of X 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1 Probability 36 36 36 36 36 36 36 36 36 36 36
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 21 / 78 Example (Two dice)
A computer simulated throwing a pair of dice, and the experiment was repeated 100 times, 1000 times and then 10, 000 times. The empirical histograms for the sums are plotted below:
We can see that the empirical histogram converges (gets closer and closer) to the probability histogram as the number of repetitions increases.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 22 / 78 Discrete Random Variable: µ and σ
Mean: The mean µ of a discrete random variable is found by multiplying each possible value by its probability and adding together all the products:
k X µ = x1p1 + x2p2 + ... + xk pk = xi pi i=1
Standard Deviation: The standard deviation σ of a discrete random variable is found with the aid of µ: q 2 2 2 σ = (x1 − µ) p1 + (x2 − µ) p2 + ... (xk − µ) pk v u k uX 2 = t (xi − µ) pi i=1
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 23 / 78 Example (Two dice): µ and σ
Two dice are rolled. The distribution table for this random event: Value of X 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 5 4 3 2 1 Probability A 36 36 36 36 36 36 36 36 36 36 36 The mean: 1 2 3 1 µ = 2 · + 3 · + 3 · + ... + 12 · =7 . 36 36 36 36 This shouldn’t be too much of a surprise as we’ve seen in class that the mean for rolling one die is3 .5. The standard deviation: r 1 2 1 σ = (2 − 7)2 · + (3 − 7)2 · + ... + (12 − 7)2 · ≈ 2.415. 36 36 36
Interpretation: If an experiment is repeated many-many times, and the average of outcomes,¯x, is computed, it is expected to be close to7. An interpretation of the standard deviation is not so clear. Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 24 / 78 Box Models
Many statistical questions can be framed in terms of drawing tickets from a box. Box Model: A model framing a statistical question as drawing tickets (with or without replacement) from a box. The tickets are to be labeled with numerical values linked to a random variable. Example: Suppose we are to flip one fair coin, we would be able to model the possible outcomes in terms of drawing from a box: There are two tickets in the box. The first is labeled1 and the second is labeled0. Flipping a head is equivalent to drawing a1 from the box and a tail is equivalent to drawing a2. If we were to flip the coin multiple times, we would be drawing multiple tickets from the box. As coin flips are independent, we draw from the box with replacement.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 25 / 78 Box Models
A Box Model is a version of a Distribution Table for a random variable. They allow one to simplify a question to an easily visualized experiment (a common theme in mathematics... a large number of questions can be framed as manipulating objects found in a box). A box model should be used when a question requires classifying and counting. If you are interested in a certain subset of values for a random variable, label the tickets corresponding to your interests as1 and the remaining tickets as0. Example: If you are interested in the occurrence of 3 or 4 when rolling a die your box model
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 26 / 78 Box Models
When we wish to describe the expected value and standard deviation for a box model, we use the formulas for discrete random variables, but we have a simpler way to visualize these ideas. The expected value of a random variable is the average of the tickets occupying the box model. The standard deviation of a random variable is the standard deviation of the tickets.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 27 / 78 The Sum of n Independent Outcomes
Since individual outcomes of an experiment are values of a random variable X , the sum of multiple outcomes will also be a random variable. What are the mean and standard deviation of this variable? We will use the following notation: n is the number of repetitions of an experiment. µ and σ are the mean and standard deviation of the random variable X which describes a single outcome of an experiment.
In terms of a box model,n is the number of draws that we will draw from our box. Because we want Independent Events, drawing from the box is with replacement.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 28 / 78 The Sum of n Independent Outcomes
When the same experiment is repeated independently n times, the following is true for the sum of outcomes: The expected value of the sum of n independent outcomes of an experiment: nµ The standard error of the sum of n independent outcomes of an experiment: √ nσ The second part of the above rule is called the the Square Root Law. Note that the above rule is true for any sequence of independent random variables, discrete or continuous!
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 29 / 78 Example (Test)
Example 4 (Test) A test has 20 multiple choice questions. Each question has5 possible answers, one of which is correct. A correct answer is worth5 points, so the total possible score is 100. A student answers all questions by guessing at random. What is the expected value and standard deviation of their total score? Solution: Let X be a number of points earned on one question. Then X is a random variable which has the following distribution:
Value of X 0 5 4 1 Probability A 5 5
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 30 / 78 Example (Test)
Value of X 0 5 4 1 Probability A 5 5 The mean of X is 4 1 µ = 0 · + 5 · =1 5 5
The standard deviation of X is r 4 1 σ = (0 − 1)2 · + (5 − 1)2 · =2 5 5
We are interested in the mean and standard error of the sum of scores from 20 questions. The questions are independent. We have: The mean of the sum = 20 · 1 = 20 points, √ The standard error of the sum = 20 · 2 ≈ 8.94 points.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 31 / 78 A Computational Trick!
When there are just two numbers, x1 and x2, in the distribution of X the distribution’s standard deviation, σ, can be computed by using the following short-cut formula: √ σ = |x1 − x2| p1p2
where pi is the probability of xi . Example (Test) The standard deviation for the distribution of points earned by guessing on one question can be easily found as
r4 1 2 σ = |0 − 5| · = 5 · =2 5 5 5 which coincides with what we found before.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 32 / 78 A Computational Trick!
This trick will be used often when we are interested in classifying and counting. These problems are framed as a box model with tickets being either 0 or 1. σ = p (fraction of 1’s) × (fraction of 0’s)
r2 4 σ = × = 0.47 6 6
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 33 / 78 Standard Error
An observed value differs from the expected value by the chance error. The likely size of the chance error is given by the standard error.
The sum of the points earned from randomly selecting answers on our 20 question test is expected to be 20 give or take the standard error of 8.94 points.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 34 / 78 The Binomial Setting
1 There are a fixed number of n of repeated trials. 2 The trials are independent. In other words, the outcome of any particular trial is not influenced by previous outcomes. 3 The outcome of every trial falls into one of just two categories, which for convenience we call success and failure. 4 The probability of a success, call it p, is the same for each trial. 5 It is the total number of successes that is of interest, not their order of occurrence.
NOTE: The Binomial Setting can be framed as a box model with only 1’s and 0’s where draws are performed with replacement.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 35 / 78 Binomial Setting
The binomial setting is appropriate under the sampling WITH replacement scheme. When sampling WITHOUT replacement, removing objects from the population changes the probability of success for the next trial and introduces dependence between the trials.
However, when the population is large enough: Removing a few items from it doesn’t change the proportion of successes and failures significantly. Successive trials are nearly independent.
Conclusion:: We can apply the binomial setting for sampling without replacement problems when the population is large.
Dr. Joseph Brennan (Math 148, BU) Part V - Chance Variability 36 / 78 Binomial Coefficients
The number of ways in which exactly k successes can occur in the n trials of a binomial experiment can be found as n n! = , k k!(n − k)!