<p> STP 420 SUMMER 2005</p><p>STP 420 INTRODUCTION TO APPLIED STATISTICS</p><p>NOTES</p><p>PART 1 - DATA</p><p>CHAPTER 3</p><p>PRODUCING DATA</p><p>Introduction</p><p>Exploratory data analysis – covered in Ch. 1 & ch.2 Use of graphs and numbers to uncover the nature of a data set.</p><p>Not good enough to provide convincing evidence for its conclusions</p><p>Formal statistical inference – answers specific questions with a known degree of confidence. - it uses the descriptive tools given in the previous chapters along with new kinds of reasoning (numerical rather than graphical)</p><p>3.1 First Steps</p><p>Major questions when trying to produce data.</p><p>1. What individuals shall you study?</p><p>2. What variables shall you measure?</p><p>Designs – arrangements or patterns used to collect data from many individuals</p><p>Some questions addressed by designs</p><p>1. How many individuals shall you collect data from?</p><p>2. How shall you select the individuals to be studied?</p><p>3. How shall you form groups where relevant?</p><p>Otherwise you may be misled by haphazard or incomplete data or by confounding (mixed effects of various variables on the same response without being able to determine which variable has which effect on the response)</p><p>1 STP 420 SUMMER 2005</p><p>Where to find data: library and internet</p><p>Anecdotal evidence – based on haphazardly selected individual cases that come to our attention, but may not be representative of any larger group of cases.</p><p>There are many places to find data including:</p><p>The annual Statistical Abstract of the United States US Census Bureau Website: http://www.fedstats.gov</p><p>Available data – data produced in the past for some other purpose but may help answer a present question</p><p>Producing new data is expensive and available data is used whenever possible.</p><p>Statistical designs used for producing data rely on sampling or experiments.</p><p>Sample – group of individuals/subjects from which data is gathered and is representative of a larger body or population </p><p>Census – when information is gathered from the whole population - time consuming and expensive, hence, the reason samples are used instead</p><p>Observational study – observes individuals and measures variables of interest but does not influence the responses</p><p>Experiment – deliberately imposes treatment on individuals in order to observed their responses</p><p>3.2 Design of Experiments</p><p>Experimental units – individuals/subjects on which experiment is done</p><p>Treatment – specific experimental condition applied to the units</p><p>Factors – explanatory variables in an experiment</p><p>Level of a factor – specific value of a factor/variable</p><p>Placebo – dummy treatment</p><p>In principle, experiments can give good evidence for causation. </p><p>2 STP 420 SUMMER 2005</p><p>Comparative experiments </p><p>Simple design with only a single treatment is:</p><p>Treatment Observe response</p><p>Control group – group that are given the dummy/placebo</p><p>It is more beneficial to have an experiment with at least two groups; one given the treatment and the other given the dummy treatment.</p><p>The design of a study is biased if it systematically favors certain outcomes.</p><p>Randomization – use of chance to divide experimental units into groups</p><p>Principle of Experimental Design</p><p>The basic principles of statistical design of experiments are</p><p>1. Control of the effects of lurking variables on the response, most simply by comparing several treatments.</p><p>2. Randomization, the use of impersonal chance to assign experimental units to treatments.</p><p>3. Replication of the experiment on many units to reduce chance variation in the results.</p><p>Statistically significant – observed effect so large that it would rarely occur by chance.</p><p>How to randomize</p><p>Drawing names/numbers in a hat Table of random digits Computer software</p><p>3 STP 420 SUMMER 2005</p><p>Random digits</p><p>Table of random digits – list of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 that has the foillowing properties:</p><p>1. The digit in any position in the list has the same chance o being any one of 0, 1, 2, 3, 4, 5, 6, 7, 8, 9</p><p>2. The digits in different positions are independent in the sense that the value of one has no influence on the value of any other.</p><p>Completely randomized design – experimental design with all experimental units are allocated at random among all treatments</p><p>Cautions about experimentation</p><p>Double-blind – neither the subjects nor the researcher knows which subject got which treatment.</p><p>Lack of realism – subjects or treatments or setting of an experiment may not realistically duplicate the conditions we really want tostudy.</p><p>Matched pairs designs</p><p>Can produce more precise results than simple random sampling Uses principles of comparison of treatments, randomization, and replication on several experimental units Is an example of block design</p><p>Block designs </p><p>Block – group of experimental units or subjects that are known before the experiment to be similar in some way that is expected to affect the response to the treatments.</p><p>In a block design, the random assignment of units to treatments is carried out separately within each block.</p><p>4 STP 420 SUMMER 2005</p><p>3.3 Sampling Design</p><p>Population – entire group of individuals that we want information about</p><p>Sample - a part of the population that we actually examine in order to gather information</p><p>Sample design – method used to choose the sample from the population</p><p>Voluntary response sample – consist of people who choose themselves by responding to a general appeal. Biased because people with strong negative opinions usually respond more often</p><p>Simple random samples</p><p>Simple random sample (SRS) – consist of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected.</p><p>Stratified samples</p><p>Probability sample – gives each member of the population a known chance (> 0) to be selected</p><p>Stratified random sample – first divide the population into groups of similar individuals called strata. Choose a separate SRS in each stratum and combine these SRSs to form the full sample.</p><p>Multistage samples</p><p>Ex:</p><p>1. Select a sample from 3000 counties in the US</p><p>2. Select a sample of townships within each of the counties chosen</p><p>3. Select a sample of city blocks or other small areas within each chosen township</p><p>4. Take a sample of households within each block chosen.</p><p>This helps in sampling more randomly across the whole country.</p><p>5 STP 420 SUMMER 2005</p><p>Cautions about sample surveys</p><p>Undercoverage – occurs when some groups in the population are left out of the process of choosing the sample.</p><p>Nonresponse – occurs when an individual chosen for the sample can’t be contacted or does not cooperate.</p><p>Response bias – causes by the behavior of the respondent/subject or the interviewer</p><p>Wording of questions – most important influence on the answers given to a sample survey. Confusing or leading questions can introduce strong bias and may even change the outcome of a survey.</p><p>3.4 Toward Statistical Inference</p><p>Statistical inference – producing data from a sample in order to draw conclusions about the wider population</p><p>Parameter – a number that describes the population. It is fixed but unknown.</p><p>Statistic – a number that describes a sample. It is known when we take the sample but may vary from sample to sample. We use a sample statistic to estimate a population parameter</p><p>Sampling variability – difference between the mean of one sample and the mean of another sample</p><p>Simulation – using random digits from a table or computer software to imitate chance behavior since taking many samples may be expensive and time consuming</p><p>Sampling distribution of a statistic – the distribution of values taken by the statistic in all possible samples of the same size from the same population. Usually normally distributed.</p><p>6 STP 420 SUMMER 2005</p><p>The bias of a statistic</p><p>Unbiased estimator – mean of its sampling distribution is equal to the true value of the parameter being estimated</p><p>The variability of a statistic – described by the spread of its distribution. The spread is determined by the sampling design and the sample size n. Larger samples have smaller spreads.</p><p>As long as the population is much larger than the sample (>= 10 times as large), the spread of the sampling distribution for a sample of fixed size n is approximately the same for any population size.</p><p>Bias and variability</p><p>High bias, low variability Off target but close together</p><p>Low bias, high variability On target but spread out</p><p>High bias, high variability Off target and spread out</p><p>Low bias, low variability On target and close together</p><p>Why randomize</p><p>It guarantees that the results of analyzing our data are subject to the laws of probability. It eliminates bias The shape of the sampling distribution is usually approximately normal and its center lies at the true value of the parameter.</p><p>7</p>
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-