Statistics Workshop Notes
Total Page:16
File Type:pdf, Size:1020Kb
Statistics Workshop Notes SCIENCE & RESEARCH INTERNAL REPORT 187 Statistics Workshop Notes SCIENCE & RESEARCH INTERNAL REPORT 187 Jennifer Brown and Bryan F.J. Manly Published by Department of Conservation PO Box 10-420 Wellington, New Zealand Science & Research Internal reports are written by DOC staff or contracr scientists on matters which are on-going within the Department. They include reports on conferences, workshops and study tours, and also work in progress. Internal reports are not normally subject to peer review. Publication was approved by the Manager, Southern Regional Office, Department of Conservation, Christchurch. © March 2001, Department of Conservation ISSN 0114-2798 ISBN 0-478-22052-9 Contents STATISTICS WORKSHOP NOTES Module 1: Background to Data Analysis Module 2: Sample Survey Designs Module 3: Design for Monitoring Schemes Module 4: Models for Analysis Module 5: Detection of Trends and Change Points Module 6: BACI Designs Appendix 1: Worksheets for Practice on SPSS by Workshop Participants Appendix 2: Examples of Analyses in SPSS Appendix 3: Department of Conservation Data Sets Appendix 4: Where the Answers to the Questions in the Modules can be Found Glossary Statistics Workshop Notes: Contents This page intentionaly left blank Statistics Workshop Notes Contents MODULE 1: BACKGROUND TO DATA ANALYSIS Summary 2 1.1 The Starting Point 3 1.2 Drawing Conclusions from Data 3 1.3 Observational and Experimental Studies 4 1.4 True Experiments and Quasi-Experiments 6 1.5 Design-Based and Model-Based Inference 8 1.6 Tests of Significance and Confidence Intervals 10 1.7 Randomization Tests 11 Example: The Effect of 1080 Poison Pellets on Invertebrates 12 1.8 Bootstrapping 15 Example: Finding a 95% Confidence Interval for the Mean Chlorophyll-a in Lakes 15 1.9 Pseudoreplication 18 1.10 Multiple Testing 19 Example: Multiple Tests on Correlations Between Characters for Brazilian Fish 20 1.11 Meta-Analysis: Methods for Combining Results from Several Studies 21 1.12 Bayesian Inference 23 1.13 Data Quality Objectives (DQO) Process 25 1.14 Key Points in This Module 27 1.15 Questions About This Module 29 References 30 Module 1: Background to Data Analysis 1 Module 1: Background to Data Analysis SUMMARY This module begins by stating the background knowledge of statistics that is needed for fully understanding the material in this and the other modules in this document. Briefly, what is required is a knowledge of what discrete and continuous statistical distributions are, the concept of a sampling distribution, an understanding of the structure of a test of significance, and an understanding of what a confidence interval means. The module then covers a number of general issues related to the design and analysis of studies: • The difference between observational studies (with passive observation only) and experimental studies (with the manipulation of conditions). • The difference between true experiments (with randomization, replication and controls), and quasi-experiments (with one or more of these components missing), and how this affects the strength of the conclusions that can be drawn. • The difference between design-based inference (which draws its validity from random sampling), and model-based inference (which relies on the assumed model being more or less correct). • The current controversy about the value of tests of significance, and whether using confidence limits instead overcomes the perceived problems. • The computer-intensive methods of randomization and bootstrapping that are receiving increasing use in all areas of science. • What pseudoreplication is, and how it can be avoided. • If and when adjustments for multiple testing should be made when analysing data. • Meta-analysis methods for combining the results of several studies on the same variable. • The difference between classical statistical methods and the Bayesian methods that are becoming popular with some data analysts. • Data quality objective (DQO) procedures for ensuring that when studies are finished they will meet the original objectives. 2 Module 1: Background to Data Analysis 1.1 The Starting Point This module and the ones that follow assume that readers are starting with a background knowledge of statistics at the level that is usually expected to be reached or exceeded after taking a typical first year university course. Table 1.1 gives a list of what this should include. What is important is not so much to be familiar with all the details of the items that are listed, but is more the understanding of the concepts involved. For example, with tests of significance it is not necessary to be able to carry out the calculations for a range of tests without looking up the equations in a text book. However, it is important to understand the logic behind these tests, i.e. the idea of setting up a null hypothesis and testing this by comparing the observed value of a test statistic with the distribution of the statistic that will apply if the null hypothesis is correct. There are many statistics texts available that cover the material in Table 1.1. If you are feeling a little statistically “rusty” then some revision using one of these texts may be useful. 1.2 Drawing Conclusions from Data Statistics is all about drawing conclusions from data, and in this module we at the basis of some of the methods that are used for drawing conclusions. Quite a variety of topics are considered, including some which are rather important and yet often receive relatively little attention in statistics texts. These include the difference between observational and experimental studies, the difference between inference based on the random sampling design used to collect data and inference based on the assumption of a particular model for the data, criticisms that have been raised about the excessive use of significance tests, the use of the computer-intensive methods of randomization and bootstrapping instead of more conventional methods, the avoidance of pseudoreplication, the use of sampling methods where sample units have different probabilities of selection, the problem of multiple testing, meta-analysis (methods for combining the results from different studies), and the use of Bayesian inference, which is currently receiving a great deal of attention. Module 1: Background to Data Analysis 3 TABLE 1.1 STATISTICAL BACKGROUND THAT IS ASSUMED IN THIS AND THE FOLLOWING MODULES CONCEPT WHAT SHOULD BE KNOWN Random variation How observations taken under apparently similar conditions display in data variation, which can be described by statistical distributions such as the normal distribution for continuous data, and the binomial distribution for discrete (count) data. Summary statistics How the mean, standard deviation, etc. are used to summarise a sample or a theoretical distribution. Distributions for The standard error of the mean, SE( x ) = σ/√n. The use of the sample statistics t-distribution for inferences about sample means. The uses of the chi-squared distribution with count data. Tests of significance The logic behind tests of significance, including the difference between one and two sided tests, the meaning of the significance level, and the role of the null and alternative hypotheses. The use of one and two sample t-tests, chi-squared goodness of fit tests. Confidence limits The interpretation of a confidence interval as one within which a population parameter will lie with a stated probability. Analysis of variance The partitioning of the total sum of squares about the mean for a set of data into components associated with different factors and their interactions, the summary of this in an analysis of variance table, and F-tests for significant effects, for factorial experiments only (i.e. one factor analysis of variance, two factor analysis of variance, etc.) Regression The idea of accounting for the variation in a dependent variable Y in terms of the variation in one or more X variables. The uses of the t- distribution and F-distribution to determine which of the X variables are important. People often use statistical methods without giving much thought to why these methods lead to valid conclusions - if indeed they do! This module is intended to make you think more critically about these matters. 1.3 Observational and Experimental Studies When considering the nature of empirical studies there is an important distinction between observational and experimental studies. With observational studies data are collected by observing populations in a passive manner that as far as possible will not change the processes going on. For example samples of animals might be collected in order to estimate the proportions in different age classes or the sex ratio. On the other hand, experimental studies are usually thought of as involving the collection of data with some manipulation of variables that is assumed to affect population parameters, keeping other variables constant as far as possible. An example of this type would be a study where possums are removed from an area to see whether this leads to improved survival of an endangered plant. 4 Module 1: Background to Data Analysis In many cases the same statistical analysis can be used with either observational or experimental data. However the validity of any inferences that result from the analysis depends very much on the type of study. In particular, an effect that is seen consistently in replications of a well designed experiment can only reasonably be explained as being caused by the manipulation of the experimental variables. But with an observational study the same consistency of results might be obtained because all the data are affected in the same way by some unknown and unmeasured variable. Therefore the ‘obvious’ explanation for an effect that is seen in the results of an observational study may be quite wrong. To put it another way, the conclusions from observational studies are not necessarily wrong. The problem is that there is little assurance that they are right (Hairston, 1989, p.