Unit 1: Introduction to Data Lecture 1: Data Collection, Observational
Total Page:16
File Type:pdf, Size:1020Kb
Have you ... Unit 1: Introduction to data Lecture 1: Data collection, observational studies, and experiments been placed into a team? successfully logged on to RStudio? If not, see me after class. Statistics 101 Mine C¸etinkaya-Rundel Also, there are still a few of you who haven’t completed the class survey, please do so ASAP. January 15, 2013 http:// stat.duke.edu/ courses/ Spring13/ sta101.001/ schedule.html Statistics 101 (Mine C¸etinkaya-Rundel) U1 - L1: Data coll., obs. studies, experiments January 15, 2013 2 / 31 Overview of data collection principles Anecdotal evidence Readiness assessment Anecdotal evidence and early smoking research Anti-smoking research started in the 1930s and 1940s when cigarette smoking became increasingly popular. While some smokers seemed to be sensitive to cigarette smoke, others were 15 mins for individual, answer using clickers completely unaffected. 15 mins for team, answer using scratch off sheets Anti-smoking research was faced with resistance based on 1 pt for each question correct on the first try anecdotal evidence such as “My uncle smokes three packs a day 0.5 pts for each question correct on the second try and he’s in perfectly good health”, evidence based on a limited no points for more than 2 tries write your team name and tally your scores sample size that might not be representative of the population. Representative from each team turns in scratch of sheets and all It was concluded that “smoking is a complex human behavior, by paper copies of assessments its nature difficult to study, confounded by human variability.” In time researchers were able to examine larger samples of cases (smokers) and trends showing that smoking has negative health impacts became much clearer. Brandt, The Cigarette Century (2009), Basic Books. Statistics 101 (Mine C¸etinkaya-Rundel) U1 - L1: Data coll., obs. studies, experiments January 15, 2013 3 / 31 Statistics 101 (Mine C¸etinkaya-Rundel) U1 - L1: Data coll., obs. studies, experiments January 15, 2013 4 / 31 Overview of data collection principles Populations and samples Overview of data collection principles Sampling from a population Populations and samples Census Wouldn’t it be better to just include everyone and “sample” the entire population? Research question: Can people This is called a census. become better, more efficient There are problems with taking a census: runners on their own, merely by It can be difficult to complete a census: there always seem to be running? some individuals who are hard to locate or hard to measure. And Population of interest: there may be certain characteristics about those individuals who are hard to locate. http:// well.blogs.nytimes.com/ 2012/ 08/ 29/ Populations rarely stand still. Even if you could take a census, the population changes constantly, so it’s never possible to get a finding-your-ideal-running-form perfect measure. Sample: Group of adult women who recently joined a running group Taking a census may be more complex than sampling. Population to which results can be generalized: Statistics 101 (Mine C¸etinkaya-Rundel) U1 - L1: Data coll., obs. studies, experiments January 15, 2013 5 / 31 Statistics 101 (Mine C¸etinkaya-Rundel) U1 - L1: Data coll., obs. studies, experiments January 15, 2013 6 / 31 Overview of data collection principles Sampling from a population Overview of data collection principles Sampling from a population Exploratory analysis to inference Sampling is natural.. Think about sampling something you are cooking - you taste (examine) a small part of what you’re cooking to get an idea about the dish as a whole. When you taste a spoonful of soup and decide the spoonful you tasted isn’t salty enough, that’s exploratory analysis. If you generalize and conclude that your entire soup needs salt, that’s an inference. For your inference to be valid, the spoonful you tasted (the sample) needs to be representative of the entire pot (the population). If your spoonful comes only from the surface and the salt is collected at the bottom of the pot, what you tasted is probably not http:// www.npr.org/ templates/ story/ story.php?storyId=125380052 representative of the whole pot. If you first stir the soup thoroughly before you taste, your spoonful will more likely be representative of the whole pot. Statistics 101 (Mine C¸etinkaya-Rundel) U1 - L1: Data coll., obs. studies, experiments January 15, 2013 7 / 31 Statistics 101 (Mine C¸etinkaya-Rundel) U1 - L1: Data coll., obs. studies, experiments January 15, 2013 8 / 31 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● Overview of data collection principles Sampling methods ● Overview● of data● collection principles Sampling methods ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● Simple random sample Stratified sample● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● Randomly select cases from the population, each case is equally ● ● Strata are homogenous,● simple random sample from each stratum. likely to be selected. ● ● Stratum 2 Stratum 4 ● Stratum 6 ● ● Index ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 3 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● Stratum 1 ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 5 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 9 Cluster 2 Cluster 5 Stratum 2 ● ● Index ● Stratum 4 ● ● ● ● ● Stratum 6 Cluster 7 Index ● ● ● ● ● ● ● ● ● ● ● ●● Stratum● ● 2 ● ● Stratum● ●4 ● ● ● ● ● ● ● ● Stratum 6 ● ● ● ● ● ● Statistics 101 (Mine C¸etinkaya-Rundel)● U1 - L1: Data coll., obs.Index studies, experiments ● January 15, 2013 9 / 31 Statistics 101 (Mine C¸etinkaya-Rundel) U1 - L1: Data coll.,● ● obs. studies, experiments January 15, 2013 10 / 31 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Stratum 3 ● ● ● ● ● ● ● ● ● ● Cluster 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● Stratum 3 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 4 ● Overview of data collection principles● Sampling● methods ● ● Overview of data collection principles Sampling methods ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● Stratum 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● Cluster sample ●● ●● ● ● ● ● ● Stratum 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Clicker question ● Cluster 6 ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● Clusters are● ● not necessarily● homogenous, simple random sample ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 1 ● ● ● ● ● ● ● ● ● ● ● ● A city council has requested a household survey be conducted in a from a random● sample of clusters. Usually preferred● ● for economical ● Stratum ●5 ● reasons. ● Stratum 5 suburban area of their city. The area is broken into many distinct and Cluster 9 Cluster 2 Cluster 5 unique neighborhoods, some including large homes, some with only ● ● Index ● ● Cluster 9 ● ● Cluster 7 ● apartments, and others a diverse mixture of housing structures. Which Cluster● ●5 Cluster● 2 ● ● ● ● ●● ● ● ● ● Index ● ● ● ● ● Cluster● 7 ● ● ● ● ● ● approach would likely be the least effective? ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 3 ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 3 ● ● ● ● ● ● ● ● ● ● ●● ● (a) Simple random sampling ● ● ● ●● ●● ● Cluster 8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 4 ● ●● ● ● ● ●● ●● ●● ● ● Cluster 8 (b) Cluster sampling ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Cluster 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● (c) Stratified sampling ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● (d) Blocked sampling ● ●● ● ● ● ● ● ● ● Cluster● 6 ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ●● ● Cluster 1 ●● Cluster 6 (e) Anecdotal sampling ● ● Cluster 1 Statistics 101 (Mine C¸etinkaya-Rundel) U1 - L1: Data coll., obs. studies, experiments January 15, 2013 11 / 31 Statistics 101 (Mine C¸etinkaya-Rundel) U1 - L1: Data coll., obs. studies, experiments January 15, 2013 12 / 31 Overview of data collection principles Sampling bias Overview of data collection principles Sampling bias A few sources of bias Landon vs. FDR Non-response: If only a small fraction of the randomly sampled people choose to respond to a survey, the sample may no longer A historical example of a biased sample yielding misleading results: be representative of the population. Voluntary response: Occurs when the sample consists of people who volunteer to respond because they have strong opinions on the issue since such a sample will also not be representative of In 1936, Landon the population. sought the Republican presidential nomination opposing