Understanding Inference – Confidence Intervals I

Questions about the Assignment Understanding Inference: Confidence Intervals I

If your answer is wrong, but you show your work you can Population parameter versus sample statistic get more partial credit. Uncertainty in estimates Sampling distribution Confidence interval

The Big Picture Statistic vs. Parameter

A sample statistic is a number computed from sample data. (e.g., sample mean: mean income of the people in the sample)

Population Sampling A population parameter is a number that describes some aspect of a population. (e.g., population mean: mean income of the entire population)

Sample We usually have a sample statistic and want to make inferences about the population parameter. Statistical Inference

The Big Picture Statistic vs. Parameter

Sample Population Statistics Parameters Population Sampling Mean μ(mu) Parameter Proportion ̂ p Std. Deviation s (sigma) r ρ (rho) Sample Correlation Statistic Slope b β (beta) Statistical Inference

1 Obama’s Approval Rating Point and Interval Estimates http://www.gallup.com/poll/113980/Gallup-Daily-Obama-Job-Approval.aspx

Gallup surveyed 1,500 Americans between June 9th-11th 2012 and 49% The sample statistic gives a point estimate (a single number) of these people approved of the job Barack Obama is doing as president. for the population parameter. What is the population? ~330million (All Americans) What is the sample size? 1,500 Usually, it is more useful to provide an interval estimate Is this categorical or quantitative variable? Categorical For categorical variables, what sample statistic are we which gives a range of plausible values for the population interested in? Sample proportion parameter: interval estimate = point estimate margin of error Sample statistic: (sample proportion) ̂ = .49

Based on this sample statistic, what do you think is the true proportion of How do we determine the margin of error??? Americans who approve of the job Barack Obama is doing as president?

Population parameter: (population proportion) p = ?

Obama’s Approval Rating

Point Estimate: = .49

Interval Estimate: 0.49 0.03 point margin estimate of error = (0.46, 0.52)

Between 46% and 52% of Americans currently approve of the job Obama is doing as president.

Important Points Reese’s Pieces

The population parameter is a fixed value. What proportion of Reese’s pieces are orange? Sample statistics vary from sample to sample. They will not match the population parameter exactly.

For a given sample statistic, what are plausible values for the population parameter? How much uncertainty surrounds the sample statistic?

It depends on how much the sample statistic varies from sample to sample!

2 Let’s Run Our Own Study Let’s Run Our Own Study

When conducting a study, we need to select a sample size. What is our population? 1,500 Typically, we take only one sample, but because we’re What is our sample size? 10 interested in knowing how much our sample statistic varies How many samples did we take? 6 from sample to sample, we’ll take multiple samples. What is your sample distribution (i.e. orange vs. not orange)? Each person take a random sample of 10 Reese’s pieces. What is your sample proportion? What is the range of plausible values for the population proportion? What is the mean proportion of the sampling distribution? The sampling distribution will be centered around the true population parameter.

Sampling Distribution: Sampling Distribution Shape and Center A sampling distribution is the distribution of sample statistics computed If samples are randomly selected and the sample size is large from different samples of the same size taken from the same population. enough, the sampling distribution will be…

In the Reese’s pieces sampling distribution, what does each dot symmetric and bell-shaped. represent? centered at the value of the population parameter. A. One Reese’s piece The sampling distribution is different B. One sample statistic from the sample distribution. The sampling distribution shows us how the sample statistic varies The sample distribution is the distribution from sample to sample. of values for variable x collected from one sample.

The sampling distribution is the distribution of sample statistics ̂ collected from multiple samples.

Sampling Distribution: Sampling Distribution: Spread Standard Deviation To assess the accuracy of our point estimate, we need to know Calculate the standard deviation of the sample statistics in the how much the sample statistic varies from sample to sample. sampling distribution. (i.e., we need to know the spread of the sampling distribution.) n = the number of samples n taken x  x 2 i xi = the sample statistic for In the Reese’s pieces sampling distribution we generated, what is sample i s  i1 the range of plausible values for the population proportion? ̅ = mean value for all of n 1 the sample statistics We use the spread of the sampling distribution to determine the As the standard deviation (i.e., spread) of the sampling margin of error for a statistic. distribution decreases, the margin of error will decrease.

What is a standardized way to measure the spread of a As the variability (i.e., spread) in the sampling distribution distribution? decreases, the uncertainty in the estimate decreases.

3 The Importance of Sample Size Sample Size

3 Sampling Each dot represents a sample statistic. The sample size influences the spread of the sampling distribution Distributions n = 1,000 The number of samples taken to generate (i.e., the variation in sample statistics), which influences the margin these sampling distributions is the same. of error for our estimate of the population parameter. n = 200 What varies for each sampling n = 50 distribution is the size of the sample taken If we increased the sample size to 100, the standard deviation of the to calculate the sample statistic. sampling distribution will... A. increase For each sample, the sample statistic (i.e., the proportion of orange pieces) would be closer The sample size does not affect the shape of the sampling distribution. B. decrease to the proportion of the population and thus C. remain the same closer to each other. The sample size does not affect the center of the sampling distribution. and the margin of error for our point estimate will… The sample size does affect the spread of the sampling distribution. A. increase B. decrease C. remains the same As the sample size increases, the spread decreases.

Hypothesis Random Samples

Increasing the sample size will cause the standard deviation of the If you take random samples, the sampling distribution will be sampling distribution to decrease. centered around the true population parameter.

If sampling bias exists (if you do not take random samples), Let’s Test Our Hypothesis the sampling distribution may provide inaccurate information http://www.rossmanchance.com/applets/Reeses3/ReesesPieces.html about the true population parameter.

Confidence Intervals Confidence Intervals

A confidence interval for a population parameter estimate is an interval Sampling Distribution Population Parameter The population parameter () is fixed. computed from sample data that will contain the true population It is typically not known. parameter for a specified proportion of all samples. ------95%------The sample statistic (xi) is random. Population Proportion It depends on the sample. p The confidence level is the proportion of samples whose intervals Confidence Interval Sample Statistic The confidence interval (xi 2SD)* is contain the true population parameter. random. It depends on the sample statistic. The sampling distribution is comprised >2 SDs The confidence level indicates how confident we are that our interval of the sample statistics and is centered contains the population parameter. on the population parameter. 95% of the sample statistics will fall A 95% confidence interval will contain the true population parameter within 2 standard deviations of the for 95% of all samples. We are 95% confident that the true population http://bcs.whfreeman.com/ips4e/cat_010/applets/confidenceinterval.html population parameter. parameters falls within this range. *The standard deviation used to calculate the confidence interval is the standard deviation of the 95% of the sample intervals will contain sampling distribution (not the sample distribution). the population parameter.

4 Confidence Intervals Standard Error: The Standard Deviation of the Sampling Distribution A 95% confidence interval can be created by: The standard deviation of the sampling distribution (i.e., the distribution of sample statistics) is called the standard error (SE). sample statistic 2 standard deviations This is done to clearly distinguish it from the standard deviation point estimate margin of error of the sample distribution.

The point estimate is calculated from our sample. The margin of error is calculated from the sampling distribution.

Summary Assignment

To create a plausible range of values for a parameter: Part I: Graded Problems 1. Take many random samples from the population, and compute 3.12, 3.16, 3.24, and 3.54 the sample statistic for each sample. Part II: (Type up this assignment in a Word document) 2. Compute the standard error as the standard deviation of all these Goto http://sda.berkeley.edu/cgi-bin/hsda?harcsda+gss10 statistics. Find 3 quantitative variables and for each variable find another quantitative variable that you think is associated with it. Conduct a correlation test to see how correlated 3. Use: sample statistic ± 2 × standard error they are. For each pair of variables provide the following information: Variable names One small problem… Question related to the variable Explain in your own words what this variable is measuring Often we only have one sample! The unit used to measure the variable (e.g., years, dollars, inches, etc.) Min, Max, Mean, Median, Standard Deviation (Std Dev) How can we calculate the variation in sample statistics, The correlation score An interpretation of the correlation score if we only have one sample?

Calculating Correlations from the GSS Calculating Correlations from the GSS

Under the “Analysis” tab, click on the “Correlation matrix” tab. Enter the names of two quantitative variables here. This is what will pop up in the new window. Click on this button and the correlation statistics will This is the correlation (r) open up in a new window. score for the two variables

5 Economy Calculating the Standard Error

A recent survey of 1,502 Americans in found that 86% consider The standard error of a sample statistic is the same thing as the economy a “top priority” for the president and congress this the standard deviation of the sampling distribution (i.e., year. distribution of sample statistics).

The standard error for this statistic is 0.01. In order to calculate the standard deviation of the sampling distribution, we need the sample statistic for multiple samples. What is the 95% confidence interval for the true proportion of all Americans that consider the economy a “top priority” for the However, in reality we typically only have one sample! president and congress this year? How do we know how much sample statistics vary, A. (0.85, 0.87) if we only have one sample? B. (0.84, 0.88) C. (0.82, 0.90) 0.86  2 × 0.01

Terms

Standard Deviation: Measures the spread of the distribution of values. (e.g., the distribution of sample values for variable x).

Standard Error: Measures the standard deviation of the sampling distribution (i.e., the distribution of sample statistics).

Margin of Error: The amount added and subtracted to a point estimate to calculate a confidence interval for a population parameter.