There Are Two Important Forms of Statistical Inference
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 8 – Estimation
There are two important forms of statistical inference: • estimation (Confidence Intervals) • Hypothesis Testing
Statistical Inference – drawing conclusions about populations based on samples of the population
parameter – unknown : μ, σ
A parameter is a number that describes the population of interest. Since we usually cannot examine the entire population of interest, parameters are generally unknown. statistic – known: , s A statistic is a number that is computed from sample data. We often use a statistic to estimate an unknown population parameter.
sample statistic and population parameter
Notation • μ = population mean (unknown) • = sample mean (computed from the data we have on hand from a sample of the population) ______• σ = population standard deviation (unknown) • s = sample standard deviation (computed from the data we have on hand from a sample of the population)
x ̅ estimates μ s estimates
Point Estimate
• An estimate of a population parameter given by a single number.
1 | P a g e • A point estimation of a population parameter is an estimate of the parameter using a single
number. is a point estimate of μ
S is a point estimate of
Sampling Variability Example: What is the average weight of women 5’1” tall between the ages of 21 - 45? The American Medical Association takes a sample of 1000 women between the ages of 21-45 years and with height 5’1” They find that that the mean weight is X ̅ = 136.2 lbs
Question: If our goal is to estimate the mean weight of the population, how should we deal with the fact that different samples yield different estimates of the mean weight??
The basic fact that the value of a sample statistic varies in (hypothetical) repeated random sampling is called sampling variability. Example: If another sample of 1000 women was chosen from the same population of 5’1” women between 21-45 years old, the value of would almost certainly be different – something other than 136.2 lbs.
Answer: Allow a margin of error that takes sampling variability into account.
Confidence Intervals Confidence intervals are generally of the form point estimate ± margin of error ± margin of error
Question: Why should we estimate μ, true population mean, with an interval of numbers? Why not just use the point estimate as our estimate of μ?
Answer: (1) Using an interval estimate (i.e. confidence interval) takes sampling variability into consideration, and (2) we can attach a level of confidence to an interval estimate which we cannot do with a point estimate.
A confidence interval for μ has two parts:
2 | P a g e 1) A margin of error says how close lies to μ. 2) A level of confidence says what percent of all possible samples satisfy the margin of error.
A confidence level, c, is any value between 0 and 1 that corresponds to the area under the standard normal curve between –zc and +zc.
Margin of Error • Even if we take a very large sample size, may differ from µ.
Critical Values For an interval of numbers there is a left endpoint and a right endpoint. (lower bound, upper bound) For a confidence level c, the critical value is the number such that the area under the standard normal curve between and equals c (your confidence level)
3 | P a g e Example - Which of the following correctly expresses the confidence interval shown below? a)
b)
c)
d) Common Confidence Levels
Area = 0.9 or Area = 0.95 or 90% 95%
-1.645 1.645 -1.96 1.96
4 | P a g e Area = 0.98 or Area = 0.99 or 98% 99%
-2.33 2.33 -2.58 2.58
Notice as the confidence level increase the interval gets wider
When constructing a confidence interval, you must decide on the risk you are willing to take of being wrong.
A confidence interval is “wrong” if it doesn’t contain the true value of the population parameter. • 99% confidence ==> 1% chance of being wrong • 95% confidence ==> 5% chance of being wrong • 90% confidence ==> 10% chance of being wrong
How confidence intervals behave • High confidence says that our method almost always gives correct answers. • A small margin of error says that we have pinned down the parameter quite precisely.
The margin of error determines the width of the confidence interval.
1) The margin of error is larger for higher confidence levels. To obtain a smaller margin of error from the same data, you must be willing to accept lower confidence. 2) The margin of error is larger for smaller sample sizes. 3) The margin of error is larger for populations that have lots of variability.
Interpreting confidence levels
Take 95% confidence, for example.
Practical Interpretation: We are 95% confident that the mean gain in score is between 18.9 and 25.1 points, on average.
Statistical Interpretation: If we repeatedly take random samples of size n from the population and construct 95% confidence intervals for each sample, then in the long run 95% of these confidence intervals will capture the true value of μ. Our sample is either one of the 95% for which the calculated interval captures μ, or one of the unlucky 5% that do not.
5 | P a g e The idea of sampling distribution
Take many samples from the same population.
Collect the x ̅ ‘s from all the samples.
Display the distribution of the x ̅ ‘s (in a histogram, for example).
The histogram will be bell-shaped and symmetric, centered at the population mean.
The sampling distribution of x ̅ is a normal distribution!
Sampling Distribution
The sampling distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.
Facts about the sampling distribution of
These facts describe how varies from one sample to the next: 1) In repeated sampling, will sometimes fall above the true value of μ and sometimes below it, but there is no systematic tendency for to overestimate or underestimate μ. The sampling distribution of is centered at μ, and so is called an unbiased estimator of μ. 2) The values of from larger samples are less variable than those from smaller samples. The standard deviation of the sampling distribution of is
Mean of x : mx= m x
6 | P a g e Standard Deviation of x : sx= s x n Confidence interval for μ for 95% Confidence
σ is known σ is unknown
If σ is known then we use Zc
If σ is unknown then we have s, then we use tc
7 | P a g e Maximal Margin of Error • Since µ is unknown, the margin of error | – µ| is unknown. • Using confidence level c, we can say that differs from µ by at most:
The Probability Statement
• In words, c is the probability that the sample mean, , will differ from the population mean, µ,
by at most , margin of error.
8 | P a g e Confidence Intervals
A ‘c’ confidence interval for µ is an interval computer from sample data in such a way that c is the probability of generating an interval containing the actual value of µ
Example - For a population of domesticated geese, the standard deviation of the mass is 1.3 kg. A sample of 45 geese has a mean mass of 5.7 kg. Find the confidence interval for the population mean at the 95% confidence level. Notice that we have (population standard deviation) so we can use Zc Calculator: STAT, TEST, Z-Interval, Choose STAT
Critical Thinking • Since is a random variable, so are the endpoints • After the confidence interval is numerically fixed for a specific sample, it either does or does not contain µ. • If we repeated the confidence interval process by taking multiple random samples of equal size, some intervals would capture µ and some would not!
• The equation states that the proportion of
all intervals containing µ will be c.
9 | P a g e Estimating µ When σ is Unknown • In most cases, researchers will have to estimate σ with s (the standard deviation of the sample). • The sampling distribution for will follow a non-normal distribution called the Student’s t distribution.
The t Distribution
Assume that x has a normal distribution with mean μ. For samples of size n with sample mean and
sample standard deviation s, the t variable is has a Student’s t distribution with degrees of freedom = n-1
Properties of the t-distribution • bell shaped and symmetric and centered at zero • there is more area in the tails in the t-distribution than there is in the N(0,1) distribution • the t-distribution is really a family of density curves such that each one is significantly different depending on the degrees of freedom • as degrees of freedom gets larger and larger the t-density curve looks more and more identical to the N(0,1)
For different levels of Confidence: For 95% Confidence Interval
For 90% Confidence Interval
For 99% Confidence Interval
10 | P a g e Example -Find the t-value for the following data: x=55.2,m = 58.1, s = 4.2, n = 40 a). –27.62 b). –0.11
c). –8.95 d). –4.37
To find values of tc you use Table 6 of Appendix II to find the critical values tc for a confidence level c. Degrees of freedom, df, are the row headings. Confidence levels, c, are the column headings
Maximal Margin of Error • If we are using the t distribution:
11 | P a g e What Distribution Should We Use?
12 | P a g e Notes on Calculator:
For Normal Distribution For Proportion
σ is unknown σ is known
Test Statistic tobs zobs Calculator
Stat⟶Test⟶T-test Stat⟶Test⟶Z-test Stat⟶Test⟶
1-Prop Z-test Confidence
Interval Calculator
Stat⟶Test⟶ Stat⟶Test⟶ Stat⟶Test⟶
T-Interval Z-Interval 1-Prop Z-Interval
Example - A study was done to determine the average number of homes that a homeowner owns in his or her lifetime. For the 60 homeowners surveyed, the sample average was 4.2 and the sample standard deviation was 2.1. Calculate the 95% confidence interval for the true average number of homes that a person owns in his or her lifetime. Notice that we have s (sample standard deviation) so we use tc Calculator: STAT, TEST, t-Interval, Choose STAT
Example - A study was done to determine the average number of homes that a homeowner owns in his or her lifetime. Suppose that this time sigma is known to be 2.8. Assume that we collect a sample of 60 homeowners and compute the sample average to be 4.2. Calculate the 95% confidence interval for the true average number of homes that a person owns in his or her lifetime.
Notice that we have σ (population standard deviation) so we use Zc Calculator: STAT, TEST, Z-Interval, Choose STAT
13 | P a g e Example: The numbers of advertisements seen or heard in one week for 30 randomly selected people in the United States are listed below. Construct a 95% confidence interval for the true mean number of advertisements.
Notice that we have s (sample standard deviation) so we use tc Calculator: STAT, TEST, t-Interval, Choose DATA
598 494 441 595 728 690 684 486 735 808 481 298 135 846 764 317 649 732 582 677 734 588 590 540 673 727 545 486 702 703
14 | P a g e Estimating p in the Binomial Distribution
• We will use large-sample methods in which the sample size, n, is fixed. • We assume the normal curve is a good approximation to the binomial distribution if both np > 5 and nq = n(1 – p) > 5.
Point Estimates in the Binomial Case
Margin of Error • The magnitude of the difference between the actual value of p and its estimate is the margin of error.
The Distribution of • For large samples, the distribution is well approximated by a normal distribution.
A Probability Statement
15 | P a g e With confidence level c, as before.
Example - Suppose that 800 students were randomly selected from the student body of 20,000 and are given shots to prevent a certain type of flu. All 800 students were exposed to the flu, and 600 of them did not get the flu. Let p represent the probability that the shot will be successful for any single student selected at random from the entire population of 20,000. a) What are the point estimates for p and q? What is the value of n and r?
16 | P a g e b) Is the number of trials large enough to justify a normal approximation to the binomial?
c) Find a 99% confidence interval for p. Calculator: STAT, TESTS, A: 1- Prop Z Int. The value of x = r
Example: A survey of 300 fatal accidents showed that 123 were alcohol related. Construct a 98% confidence interval for the proportion of fatal accidents that were alcohol related.
17 | P a g e Choosing Sample Sizes
• When designing statistical studies, it is good practice to decide in advance: – The confidence level – The maximal margin of error – Then, we can calculate the required minimum sample size to meet these goals.
Sample Size for Estimating μ
*If σ is unknown use s
If σ is unknown, use σ from a previous study or conduct a pilot study to obtain s. Always round n up to the next integer!!
Sample Size for Estimating If we have a preliminary estimate for p, use the following.
If we have no preliminary estimate for p, use the following modification:
Example – A wildlife study is designed to find the mean weight of salmon caught be an Alaskan fishing company. A preliminary study of a random sample of 50 salmon showed pounds. How large a
18 | P a g e sample should be taken to be 90% confident that the sample mean , is within 0.20 pounds of the true
mean weight μ?
Example: A researcher wishes to estimate the number of households with two cars. How large a sample is needed in order to be 98% confident that the sample proportion will not differ from the true proportion by more than 5%? A previous study indicates that the proportion of households with two cars is 19%.
19 | P a g e