Advanced Statistics for Environmental Professionals
Total Page:16
File Type:pdf, Size:1020Kb
Advanced Statistics for Environmental Professionals Bernard J. Morzuch Department of Resource Economics University of Massachusetts Amherst, Massachusetts [email protected] Table of Contents TOPIC PAGE How Does A Statistic Like A Sample Mean Behave? ................................................................................ 1 The Central Limit Theorem ........................................................................................................................ 3 The Standard Nonnal Distribution .............................................................................................................. 5 Statistical Estimation ................................................................................................................................... 5 The t-distribution ....................................................................................................................................... 13 Appearance Of The t-distribution ............................................................................................................. 14 Situation Where We Use t In Place Of z: Confidence Intervals ............................................................... 16 t-table ....................................................................................................................................................... 17 An Upper One-Sided (1-a) Confidence Interval For µ .......................................................................... 18 Another Confidence Interval Example................................................................. ..................................... 18 Summary And Words Of Caution When Using t Or z ............................................................................. 20 Treatment Of Outliers And Testing Suggestions ..................................................................................... 20 A Simple Approach For Assessing Data Distribution And The Possibility Of Outliers .......................... 20 A Data Set's Five-Number Summary And Box-And-Whisker Diagram (Or Boxplot) ............................ 20 Interquartile Range (IQR) And Outliers ................................................................................................... 22 Examples ................................................................................................................................................... 22 Hypothesis Testing: The Classical Approach (Test Of One Mean) .......................................................... 23 Step 1: State the null and alternative hypotheses .......................................................................... 23 Step 2: Decide upon a tail probability associated with the null hypothesis being true ................. 25 Step 3: Establish a decision rule to assist in choosing between hypotheses ................................. 25 Step 4: Generate your samples. Calculate the test statistic........................................................... 26 Step 5: Apply the decision rule. Make a decision. State your conclusion in words ..................... 27 The P-Value Approach To Hypothesis Testing ........................................................................................ 27 Complementarity Between Hypothesis Testing And Confidence Interval Construction ......................... 28 Testing For Normality: The Shapiro-Wilk Test.. ...................................................................................... 28 Hypothesis Testing: Comparison Between Two Means ........................................................................... 31 Step 1: State the null and alternative hypotheses .......................................................................... 33 Step 2: Decide upon a tail probability associated with the null hypothesis being true ................. 33 Step 3: Establish a decision rule to assist in choosing between hypotheses ................................. 33 Step 4: Generate your samples. Calculate the test statistic ........................................................... 34 Step 5: Apply the decision rule. Make a decision. State your conclusion in words ..................... 34 Incorrect Decisions In Hypothesis Testing ............................................................................................... 34 A Calculation For 13 and 1-13 ...................... ............................................................................................. 40 Sample Size Issues .............................................................. ...................................................................... 42 Behavior Of Obseivations Having A Lognormal Distribution ................................................................. 42 Small Sample Sizes And Parent Distribution Departing From Normality ............................................... 44 An Experimental Design: Set-Up For Generating Lognormal Parameter Estimates ............................... 44 Parameter Estimators For A Lognonnal Distribution ............................................................................... 45 Getting Parameter Estimates: Probability Plotting ................................................................................... 46 Land's Approach To Get A Confidence lnteival ...................................................................................... 48 Dealing With Censored Data Sets ............................................................................................................. 49 Getting Parameter Estimates: Censored Data And Probability Plotting ................................................... 51 Strategies To Determine The Proper Number Of Samples ....................................................................... 51 Sample Size Based on Variance of the Sample Mean .................................................................. 52 Sample Size Based on Margin of Error of the Sample Mean ....................................................... 52 Sample Size Based on Relative Error of the Sample Mean .......................................................... 54 Nonparametric Statistical Tests ................................................................................................................. 55 The Mann-Whitney Test ........................................................................................................................... 56 Summary ...................................... ............................................................................................................. 56 References ................................................................................................................................................ 57 How Does A Statistic Like A Sample Mean Behave? Motivation: You are at a wetland site, and you would like to get an estimate of the true mean level of lead concentration in the soil. (Unknown to you, suppose that the population mean -- the true overall mean level of lead concentration in the soil -- is 40 mg/kg. Suppose also that the standard deviation of an infinite number of measures is 15 mg/kg. And suppose that the distribution of this infinite number of measures is not normal but skewed to the right). Q: How do you proceed in generating your estimate? A: You might rely on an experimental design whereby you walk in a straight line across the site and take a new soil sample every so many meters. You repeat this process for lines that are parallel to the original. When sufficient parallel lines are walked off, the process is repeated in the perpendicular direction. Ultimately, you generate a "sufficient" number of samples that you believe characterize the soil conditions for this particular wetland site. Suppose that the number of samples that you take is n= 15. Here are their ordered measurements, in mg/kg: 17.1 21.4 23.3 23.4 24.7 25.6 26.9 26.9 27.3 29.2 33.1 37.3 44.9 49.8 58.8 Q: For these 15 samples, what would be an overall representative measure of lead concentration in the soil? A: The sample mean X. For these 15 observations, X =31.32 mg/kg. Now, suppose a colleague was requested to generate 15 soil samples at this same site using the same experimental design. Assume that she does not know where you walked-off your first line, so that she starts her walk at a different spot than you. Q: Would you expect her to get the same 15 numbers for her soil samples as you? A: No. Q: Would you expect her sample mean -- based upon her 15 samples -- to be the same as your sample mean? A: No. Q: Why not? A: Because sampling involves error; i.e .• we never incorporate all aspects of the phenomenon that we are attempting to measure. Suppose an additional 498 of your colleagues were asked to repeat this experiment. Q: How many sample means will have been generated in total, beginning with yours? A: 500 Suppose you were asked to construct a histogram for these 500 sample means. Notice that you are being asked to construct a histogram for sample means, not for individual observations. 1 Q: Where would you expect the histogram to be centered? A: Around µ =40 mg/kg, which is the true mean. Notice that these 500 sample means will have a spread, i.e., a standard deviation. Q: Will the standard deviation of the sample means be related to the standard deviation of the individual observations, i.e., to <> =15 mg/kg? A: This is hard to tell, but the answer is "yes". Recall that the distribution of the individual observations was said to be skewed. Q: What will the shape