<<

Lab 3

Describing Two Variables

We’ll again analyze the class , only this time looking at relationships between two variables. Again, if you forgot what the variables are, you can view the survey under Tests and Quizzes on Sakai. For the following exercises, you may use either StatKey or RStudio.

Exercise 1 Choose two categorical variables of interest to you, and describe their relationship. Include summary (s), at least one visualization, and a description in your own words.

Exercise 2 Choose one categorical and one quantitative variable of interest to you, and describe their relationship. Include summary statistic(s), at least one visualization, and a description in your own words.

Exercise 3 Choose two quantitative variables of interest to you, and describe their relationship. Include summary statistic(s), at least one visualization, and a description in your own words.

Tips for StatKey: The data is at https://docs.google.com/spreadsheet/ccc?key=0AtJ5X5rxFtfqdFI4QU8tcW1HOFBlMUFYNXBRc3RBMlE. To copy and paste two columns at once into StatKey, they will need to be next to each other. I recommend either copying and pasting the entire spreadsheet into your own google doc, in which you can then move columns around. Remember to delete any rows with NA values (unless the variable is categorical, and then NA values can be treated as a separate category). For two categorical, remember to check the “Raw Data” box.

Tips for RStudio: Relevant commands for RStudio can be found at http://stat.duke.edu/courses/Fall12/sta101.002/Rcommands.pdf . Run the following two lines to update the functions for this class and to get the data: source("/shared/[email protected]/Lock5.R") survey = google.doc("0AtJ5X5rxFtfqdFI4QU8tcW1HOFBlMUFYNXBRc3RBMlE")

Sampling Distribution of a Proportion: StatKey

First, we’ll explore a distribution for a proportion using StatKey. Click on “Proportion” in the row titled “”. By default our population is US adults, our parameter of interest is the proportion of college graduates, p = 0.2751. We will repeatedly take random samples from this population.

Exercise 4 Generate one random sample of size 50.2 How many people in your sample are college graduates? What is pˆ Generate another sample... what is pˆ for this sample? Find both of these corresponding dots in the sampling distribution.

Exercise 5 Generate 1000 random samples of size 50. How far does pˆ tend to vary from the true p? What is the of pˆ for samples of size 50 sampled randomly from this

1We do not know this exactly, but can get a very precise estimate of this population proportion, so are treating it as known. 2You can set the sample size in the top right, and can generate one sample by clicking “Generate One Sample” on the top left.

1 population?

Exercise 6 What is the standard error of pˆ for samples of size 500 sampled randomly from this population?

Sampling Distribution of a : RStudio

Load the RockandRoll data in RStudio. This is data on a population because we have data on all inductees into the Rock and Roll hall of fame as of 2012. We’ll construct sampling distributions for the average number of people per group/individual inducted.

Exercise 7 What is the average number of people, µ, for groups or individuals inducted into the Rock and Roll hall of fame?

Exercise 8 Take a random sample of 10 inductees to the Rock and Roll hall of fame. Who are they?

As a reminder, we can use sample to draw a random sample of inductees, but we can also use sample to directly draw a random sample of values from a variable. We can also nest commands, so can put one command inside of another. For example, we take the mean of a random sample of 10 values of People with mean(sample(RockandRoll$People, 10))

Exercise 9 Find the mean group size for a random sample of 10 inductees to the Rock and Roll hall of fame. Repeat for a second random sample.

One command that will come in very useful in the coming weeks is do, which allows you to “do” any command multiple times, without having to type it hundreds of times. Simply typing do(20)* in front of any command will do it 20 times. Try repeating the above command 20 times with do: you should get 20 different numbers, each a mean group size for a random sample of 10 different inductees.

Exercise 10 Take 1000 random samples of size 10 from Rock and Roll hall of fame inductees, and for each sample compute the mean, x. Create a of this sampling distribution. How far do the sample , x, tend to vary from the true population mean, µ?

Exercise 11 What is the standard error of x, sample mean group size, when samples of size 10 are taken from this population?

Exercise 12 What is the standard error of x, sample mean group size, when samples of size 30 are taken from this population?

2