<<

Advanced for Environmental Professionals

Bernard J. Morzuch Department of Resource Economics University of Massachusetts Amherst, Massachusetts [email protected]

May 2005 Table of Contents

TOPIC PAGE

How Does A Like A Sample Behave?...... 1 The ...... 3 The Standard Normal Distribution...... 5 Statistical Estimation...... 5 The t-distribution...... 13 Appearance Of The t-distribution ...... 14 Situation Where We Use t In Place Of z: Confidence Intervals ...... 16 t-table ...... 17 An Upper One-Sided (1 −α) For µ ...... 18 Another Confidence Interval Example...... 18 Summary And Words Of Caution When Using t Or z ...... 20 Treatment Of And Testing Suggestions ...... 20 A Simple Approach For Assessing Data Distribution And The Possibility Of Outliers ...... 20 A Data Set’s Five-Number Summary And Box-And-Whisker Diagram (Or Boxplot)...... 20 Interquartile (IQR) And Outliers ...... 22 Examples...... 22 Hypothesis Testing: The Classical Approach (Test Of One Mean)...... 23 Step 1: State the null and alternative hypotheses...... 23 Step 2: Decide upon a tail probability associated with the null hypothesis being true...... 25 Step 3: Establish a decision rule to assist in choosing between hypotheses...... 25 Step 4: Generate your samples. Calculate the ...... 26 Step 5: Apply the decision rule. Make a decision. State your conclusion in words...... 27 The P-Value Approach To Hypothesis Testing ...... 27 Complementarity Between Hypothesis Testing And Confidence Interval Construction ...... 28 Testing For Normality: The Shapiro-Wilk Test...... 28 Hypothesis Testing: Comparison Between Two ...... 31 Step 1: State the null and alternative hypotheses...... 33 Step 2: Decide upon a tail probability associated with the null hypothesis being true...... 33 Step 3: Establish a decision rule to assist in choosing between hypotheses...... 33 Step 4: Generate your samples. Calculate the test statistic...... 34 Step 5: Apply the decision rule. Make a decision. State your conclusion in words...... 34 Incorrect Decisions In Hypothesis Testing ...... 34 A Calculation For β and 1-β ...... 40 Sample Size Issues ...... 42 Behavior Of Observations Having A Lognormal Distribution...... 42 Small Sample Sizes And Parent Distribution Departing From Normality ...... 44 An Experimental Design: Set-Up For Generating Lognormal Parameter Estimates ...... 44 Parameter Estimators For A Lognormal Distribution...... 45 Getting Parameter Estimates: Probability Plotting ...... 46 Land’s Approach To Get A Confidence Interval...... 48 Dealing With Censored Data Sets...... 49 Getting Parameter Estimates: Censored Data And Probability Plotting...... 51 Strategies To Determine The Proper Number Of Samples...... 51 Sample Size Based on of the Sample Mean ...... 52 Sample Size Based on Margin of Error of the Sample Mean ...... 52 Sample Size Based on Relative Error of the Sample Mean...... 54 Nonparametric Statistical Tests...... 55 The Mann-Whitney Test ...... 56 Summary ...... 56 References ...... 57

How Does A Statistic Like A Sample Mean Behave?

Motivation: You are at a wetland site, and you would like to get an estimate of the true mean level of lead concentration in the soil. (Unknown to you, suppose that the population mean -- the true overall mean level of lead concentration in the soil -- is 40 mg/kg. Suppose also that the standard of an infinite number of measures is 15 mg/kg. And suppose that the distribution of this infinite number of measures is not normal but skewed to the right).

Q: How do you proceed in generating your estimate?

A: You might rely on an experimental design whereby you walk in a straight line across the site and take a new soil sample every so many meters. You repeat this process for lines that are parallel to the original. When sufficient parallel lines are walked off, the process is repeated in the perpendicular direction. Ultimately, you generate a “sufficient” number of samples that you believe characterize the soil conditions for this particular wetland site. Suppose that the number of samples that you take is n=15. Here are their ordered measurements, in mg/kg:

17.1 21.4 23.3 23.4 24.7 25.6 26.9 26.9 27.3 29.2 33.1 37.3 44.9 49.8 58.8

Q: For these 15 samples, what would be an overall representative measure of lead concentration in the soil?

A: The sample mean X . For these 15 observations, X =31.32 mg/kg.

Now, suppose a colleague was requested to generate 15 soil samples at this same site using the same experimental design. Assume that she does not know where you walked-off your first line, so that she starts her walk at a different spot than you.

Q: Would you expect her to get the same 15 numbers for her soil samples as you?

A: No.

Q: Would you expect her sample mean -- based upon her 15 samples -- to be the same as your sample mean?

A: No.

Q: Why not?

A: Because involves error; i.e., we never incorporate all aspects of the phenomenon that we are attempting to measure.

Suppose an additional 498 of your colleagues were asked to repeat this .

Q: How many sample means will have been generated in total, beginning with yours?

A: 500

Suppose you were asked to construct a for these 500 sample means. Notice that you are being asked to construct a histogram for sample means, not for individual observations. 1 Q: Where would you expect the histogram to be centered?

A: Around µ =40 mg/kg, which is the true mean.

Notice that these 500 sample means will have a spread, i.e., a .

Q: Will the standard deviation of the sample means be related to the standard deviation of the individual observations, i.e., to σ =15 mg/kg?

A: This is hard to tell, but the answer is “yes”.

Recall that the distribution of the individual observations was said to be skewed.

Q: What will the shape of the histogram for the sample means look like? Will it be skewed as it is for the individual observations?

A: You are inclined to say that it will be skewed, because the individual observations upon which it is based have a skewed distribution. But this is not correct! It will look more normally distributed than skewed! We will demonstrate this shortly.

Suppose we return to the beginning of the experimental design. Rather than generating 15 soil samples, each of the 500 individuals is asked to generate 30 soil samples and to calculate sample means, each based upon 30 samples rather than 15.

Q: Where would the histogram be centered?

A: Around µ =40 mg/kg.

Q: What would be the spread of these 500 new sample means, where each sample mean is based upon a larger number of observations?

A: Smaller than the spread of the previous 500 sample means, each of which was based on less observations.

Q: Why is this so?

A: Because the sample means that we now calculate are each based on twice the amount of information. Since each uses more information, each should be closer to the true mean, the item which each is designed to represent. If they are all collectively closer to µ , they have less spread around µ; i.e., they have a smaller standard deviation.

Q: What will be the shape of the histogram for these 500 sample means, each of which is based upon 30 observations? Will the histogram be skewed?

A: No, it will be (approximately) normal! This is guaranteed by something called the Central Limit Theorem (CLT).

Are you skeptical? We will demonstrate with a computer simulation. But first, a summary.

2 The Central Limit Theorem

Begin with a definition:

Sampling distribution: the of a sample statistic like the sample mean.

Central Limit Theorem: If all possible random samples, each of size n, are taken from any population with a mean µ and standard deviation σ , the of sample means will:

1. Have a mean (µx ) equal to µ . Note the new notation.

2. Have a standard deviation (σX ) equal to σ /n. From here on, the standard deviation of the sample mean will go by the special name of the mean.

3. Be normally distributed when the parent population is normally distributed or will be approxi- mately normally distributed for samples of size 30 or more when the parent population is not normally distributed. The approximation to the normal distribution improves with samples of larger size.

In short, the Central Limit Theorem states the following:

1. µ=x µ; the mean of the Xs equals the mean of the Xes.

σ 2. σ= ; The standard error of the mean equals the standard deviation of the population x n divided by the square root of the sample size.

3. The sample means are:

— normally distributed if the parent population is normal.

— approximately normally distributed regardless of the shape of the parent population if n3≥ 0, and it improves as n gets larger.

NOTE: The n referred to in the Central Limit Theorem is the number of items sampled (or the number of samples taken). It is commonly referred to as sample size.

Let's look at the behavior of X using a computer simulation. A visual example like this provides insight about what the Central Limit Theorem accomplishes (without having to rely on mathematical proofs).

Consider a parent population consisting of items from an . The mean of all of the observations is µ=X 4 , and the standard deviation is σX = 4 .

The simulation will be done using Minitab and is represented with the following schematic diagram.

3 How The Sample Mean Behaves: An Experiment Using Minitab

Spread Sheet XBAR4 Now, the workings Sampling ↓ ⋅10,000 sample means, Parent Population: of the truly Experiment 1: each based on a Exponential Distribution → wondrous → 10,000 samples; C11-C14 sample of size n=4. → Central Limit each sample is ⋅Three items: P(X=x) Theorem of size n=4: ⋅shape of the distribution; µ x =4 10,000 rows and ⋅ µ should be around 4; 4 columns. x σx =4 ⋅ σx should be around

σ 4 x = = 2 . Range of XBAR4: n4 X=x from ____ to ____

Sampling Spread Sheet XBAR40 ⋅10,000 sample means, Sampling Spread Sheet XBAR400 ⋅10,000 sample means, Experiment 2: ↓ each based on a Experiment 3: ↓ each based on a → 10,000 sample of size n=40. → 10,000 sample of size n=400. samples; C101-C140 ⋅Three items: samples; C201-C600 ⋅Three items: each sample ⋅shape of the distribution; each sample ⋅shape of the distribution; is of size is of size ⋅ µx should be around 4; ⋅ µx should be around 4; n=40: n=400: ⋅ σ should be around ⋅ σ should be around 10,000 rows x 10,000 rows x σ 4 σ 441 and 40 x ==0.632 . and 400 x ====0.2 columns. n40 columns. n40020 5 Range of XBAR40: Range of XBAR400: . from ____ to ____ from ____ to ____

4 The Standard Normal Distribution

Any normally distributed (like X ) can be transformed into another normally distributed random variable that always has a mean of zero and a standard deviation of 1. This is called standardizing, and the transformed variable is called standard normal z. Given a value for the population mean (µ) , the population standard deviation (σ) , a value for the sample mean ( X ), and the sample size (n) used in calculating X , we are able to calculate z as:

X − µ z = σ n

The reason for the transformation is that probabilities have been calculated for all possible values of z, and these are presented in the z-table. So, if one desires to find the probability associated with values of X , simply transform to z, and use the z table.

From another perspective, look closely at the numerator of the transformation. It measures the distance between our calculated sample mean and a (hypothetical or otherwise) population mean. It seeks an answer to the eventual question: Is our sample statistic close to or far from the universal norm µ ? This is an issue of utmost concern in applied work.

Once the numerator is calculated, the next question ought to be: Is this calculated distance a big number or is it a small number? The answer to this question is -- “It’s all relative.”

The numerator is large or small relative to some standard. The standard presents itself in the denominator of z. The numerator is large or small relative to a measure of the spread of all of the data. This measure is the standard deviation.

So, for example, a large z-value translates into the sample mean being “far” from the population mean. This has genuine implications for decision making. It may suggest that some sort of corrective action be taken.

The initial requirement of statistical estimation and is a firm handle on all the pieces to the puzzle -- sample mean, population mean, population standard deviation, z-transformation, calculating probabilities from the z-table using the z-transformation -- and how they relate to each other. The next thing to do is to manipulate the value of one of these items and observe what happens to the values of the rest. Statistical estimation is concerned with these manipulations.

Statistical Estimation Overview:

We are now in the first phase of statistical inference. The setting for the problems that we solve is as follows: ⋅ You are given an X , n, and σ , but you do not know (i.e., are not given) µ . ⋅ You want a reliable estimate of µ . (You want this estimate because important decisions are going to be made on the basis of your result).

5 ⋅ What should you do?

Begin by exploring the behavior of X . Recall that its behavior is explained by the Central Limit Theorem.

What is X? It is: ⋅ a sample statistic; ⋅ a random variable; i.e., it can take on any value since it is sensitive to sampling; ⋅ an estimator of µ .

Note the picture below. It is a sampling distribution of X , where each of the millions of possible Xs is based on a sample size of n. (The sample size is the same for each X , but the observations comprising each X are most likely different). Suppose that the observations that we draw lead to the particular X on the horizontal axis below.

X

● X µ

our X

Unfortunately, we rarely know the location of µ . It is placed in the picture above simply as a point of reference. Given the probable distance between the unknown population parameter (e.g., µ ) and its estimator (e.g., X ), it makes sense to distinguish two types of estimators:

⋅ point estimator: one value is specified as the estimate of the population parameter; e.g., X is a point estimator for µ ; the specific value of X , e.g., 31.32 mg/kg, is a point estimate. Also, s is a point estimator for σ . The specific value of s, e.g., 11.59 mg/kg, is a point estimate.

⋅ interval estimator: a range of values that conceivably contains the true population parameter.

The motivation for an interval estimator is as follows. Consider the original picture above.

It is risky to assert with 100% certainty that a particular value of X (a point estimate) will equal µ

µ • X

our X

some some Why not admit that: distance distance

64748 64748 our some distance contains the • point ± from the or population X estimator point estimator equals parameter 6 Q: What is a good candidate for “some distance”?

A: An excellent candidate is derived as follows:

XX− µ−µ ⋅ Begin with the z-transformation formula: z == σ σx n

⋅ Multiply both sides by σ/n=σx zX⋅σ=x −µ

⋅ (Recall that z itself is simply a number from the z-table, typically 1, 2, or 3, or any number between 0 and 3.90).

The result directly above says that the distance between the sample statistic and the population parameter can be characterized as an arbitrarily selected number (e.g., z = 1 or perhaps 2 or perhaps 3 or any number between 0 and 3.90) of standard errors (σx ), so that the interval presented above, which is:

point + some contains population estimator distance parameter takes the form:

X + z ⋅σx = µ

Thus, the item that brings X into equality with µ is z ⋅σx . This quantity is typically referred to as the margin of error due to sampling or the maximum error of the estimate.

A more formal way of expressing the interval mathematically is:

Xz−⋅σx<µ

In words, the formula above says that µ is contained in the interval bounded on the left by the calculated sample mean minus a number (z) of standard errors and on the right by the calculated sample mean plus a number (z) of standard errors.

An important issue is the following:

Q: Once X and σx are specified, what determines the size or width of the interval?

A: The z-value.

Q: Who controls the magnitude of z?

A: You--the decision maker--control z.

Q: What does a large, relative to a small, value of z do to the width of the interval?

7 A: It increases the width of the interval.

Q: What is it about a larger z-value that causes the interval to become wider?

A: A larger positive quantity is subtracted from the left lower bound (thus expanding its limit to the left) and added to the right upper bound (thus expanding its limit to the right).

Q: How would an interval estimate look for a given X , a given σx , and a small versus large z-value?

A:

X µ ●

our X

● → Interval with a small z (Note: µ is not contained in this interval.) X

● → Interval with a large z (Note: µ is contained in this interval.) X

The importance of z:

⋅ For a given X and σX , interval width depends on the value of z selected. ⋅ z itself is obtained from the z-table; so it can range from 0 to 3.9. ⋅ As z gets larger, the interval becomes wider. ⋅ Compare the drawings of the two intervals in the previous picture: ⋅ The wider the interval becomes, the more confident we are that the interval contains µ . ⋅ The narrower the interval becomes, the less confident we are that the interval contains µ . ⋅ Beliefs--weak or strong--that the interval contains µ suggest a name for the interval itself: confidence interval.

A confidence interval represents our belief of plausible values that the unknown population mean can have.

An important issue: the trade-off between confidence and precision:

Q: If we desire to be totally confident that the interval contains µ , why not make the interval as wide as possible; i.e., use a z-value of 3.9?

A: Notice that a confidence interval is our statement about conceivable values for the unknown population parameter. A wide interval permits extreme values as estimates for µ . These extreme values may be terrible estimates. Terrible estimates do not introduce helpful information to the decision making process.

8

Notice that, with a wide confidence interval, we are assuring ourselves that the interval contains µ , but we are not narrowing down--not pinpointing--reasonable estimates. So, increased confidence comes at the cost of decreased precision.

Also, with a narrow confidence interval, we become less assured that the interval contains µ , but we are narrowing down--pinpointing--reasonable estimates. So, increased precision comes at the cost of decreased confidence.

Another important issue: translating z into a confidence level:

⋅ z is an integral part of the confidence interval formula. Confidence increases as z increases. ⋅ z-values are associated with probabilities. ⋅ This relationship between z and probabilities now transfers to confidence intervals. ⋅ Preview: Larger z’s result in both larger probabilities (from the z-table) and larger or wider confidence intervals. These probabilities are referred to as levels of confidence. So, a level of confidence is a probability assessment. It is a probability assessment of our confidence that the interval contains µ . ⋅ Look at the relationship: (a) between positive z and an area under the standard normal curve; (b) between +z and its corresponding symmetrical, interior area; and (c) between this symmetrical interior area and its equivalent level of confidence.

± z and its symmetrical level of z and the area from 0 to z interior area ± z confidence X

.45 .45 .45 ± 1.645 0.90 0 1.645 -1.645 01.645

X X

.475 .475 .475 ± 1.96 0.95

1.96 -1.96 0 1.96 0 X X

.495 .495 .495 ± 2.575 0.99 0 2.575 -2.575 0 2.575

⋅ Thus, a 90% confidence interval uses 1.645 as the z-value in the confidence interval formula, a 95% confidence interval uses 1.96, and a 99% confidence interval uses 2.575.

⋅ After we construct a 90% confidence interval, the proper interpretation is: “We can be 90% confident that the true mean is within this interval.”

9 Look at the previous diagram, and take note how the normal curves display symmetrical interior areas. We will now generalize these pictures with the notational convention used by statisticians. We begin by recognizing that the total area under the normal curve is 1 or 100%.

For each component of the normal curve, the corresponding notation is as follows:

component notational convention and explanation ⋅ symmetrical interior area ⋅ 1- α : level of confidence or confidence level ⋅ total area in both tails combined ⋅ α : tail area = total area - interior area α = 1 - (1- α ) ⋅ area in right tail ⋅ α /2: tails are symmetric, so each is α 2 ; ⋅ area in left tail ⋅ α /2: areas are positive; each α 2 is positive.

⋅ z-value separating the right-half ⋅ zα /2 : subscript matches right tail area; symmetric area and the right tail (z is positive because it is to the right of z=0.)

⋅ z-value separating the left-half ⋅ -zα /2 : subscript matches left tail area; symmetric area and the left tail (z is negative because it is to the left of z=0.)

The picture that captures everything above is

X

α|2 1-α α|2

z -zα|2 zα|2

and the way to rewrite the original confidence interval formula for a (1− α ) level of confidence is

Xz−⋅αα22σxx≤µ≤X+z ⋅σ .

The difference between this rewrite and the original confidence interval is the α 2 subscript on z. Don’t let this confuse you. Nothing different has been done in the construction of the interval. The α 2 subscript on z is presented to accompany the level of confidence, 1− α . These are nothing more than notational matches. After all, if 1−α (i.e., the symmetric interior area or the level of confidence) changes, the corresponding z-values (which are ±zα 2 ) likewise must change.

Example: Find ±zα 2 for a 90% confidence interval.

Solution: Follow the method under “notational convention and explanation” above.

(1) This is your start. You are given 1− α = 0.90. Now (2) to (5) below are straightforward. (2) α = 1 - 0.90 = 0.10

10 (3) α 2 = 0.10/2 = 0.05

(4) zα 2 = z0.05 = 1.645

(5) - zα 2 = -z0.05 = -1.645

Thus, the z-values for a 90% confidence interval are ±1.645.

Let's consider a heavy metal different than lead. Suppose that we have 15 chromium samples. Their measurements (in mg/kg) are:

3.5199 6.5252 8.4996 6.4097 4.7424 5.5125 5.9328 2.6428 6.7628 5.3015 4.1472 3.4474 5.9564 5.4219 9.4118

On the basis of the 15 samples (n=15), X =5.616 mg/kg and s=1.827 mg/kg. For now, assume that σ =s=1.827 mg/kg. Given this information, construct a 90% confidence interval for the unknown population mean and provide an interpretation for this confidence interval.

The interval itself: X−⋅zαα22(/σn)≤µ≤X+z ⋅(/σn)

Substitutions into our 90% confidence interval: 5.616-1.645 ⋅ (1.827/ 15 ) ≤ µ≤5.616+1.645⋅(1.827 / 15) (n=15) 5.616-0.776 ≤ µ≤5.616 + 0.776 4.84 ≤ µ≤6.392

Interpretation: We can be 90% confident that the true mean is somewhere between 4.84 mg/kg and 6.392 mg/kg.

Alternative Suppose 15 samples are taken, X is calculated, and a 90% confidence interval Interpretation: is constructed. Suppose that this “experiment” is repeated nine more times, so that we have constructed a total of ten 90%-confidence intervals, (each one relying on a new X and each X calculated from 15 new samples). Here is the alternative interpretation: Of these ten similarly constructed intervals, nine of these ten intervals (or 90%) can be expected to contain the true mean µ .

The pictorial representation of this interpretation is as follows:

11 Notion of a 90% Confidence Interval

X Sampling distribution of X (Note:The standard error is defined as 0.05 0.90 0.05 σ =σ n.  X

X z.05σX z.05σX µ 14444244443 • In the picture above, a • While each of the point value of z.05 = 1.645 estimates X does not z σ z σ i implies a 45% area to .05 X .05 X equal µ , the intervals either side of the mean. ● X1 constructed around 9 of the • 2 x 45% = 90% ● 10 Xs do contain µ . This symmetrical area in X2 i ● total. is the meaning of a 90% X3 confidence interval. • Interpretation of the ● picture to the right: X4 • A 100% confidence interval conduct an experiment ● could be obtained by and construct 10 X5 changing the z value to 3.9 ● intervals. Of these 10, (z0 = 3.9 implies 50% of X6 nine of 10 (or 90%) can ● the area to either side of the be expected to contain X7 mean). This stretches the the true mean when we ● band around each X . i use z.05 = 1.645. X8 ● Now, the band around X7 X9 ● will include µ . X 10 True unknown µ located here

One final issue regarding confidence intervals:

Q: Does there have to be a trade-off between confidence and precision? For example, does an increase in precision ( i.e., a narrower interval) have to come at the cost of a decrease in confidence?

A: The answer is no. The reason goes something like this:

Q: If you take more samples, would you expect your calculated sample mean to be a better representation of µ ? Alternatively, would you expect your X to be closer to µ ?

A: Yes! This is a consequence of the Central Limit Theorem.

Q: As you increase the number of samples, what should happen to the width of your confidence interval for a given level of confidence?

A: It will decrease in width. Alternatively, it becomes more precise.

Q: What is responsible for this increase in precision, given the confidence interval formula?

A: n in the denominator of σx . (Recall that σ=x σ/n.)

12 Example: Take the previous 90% confidence interval that we calculated. To see what happens, let X and s (assumed to equal σ ) remain unchanged. However, increase n from 15 to 25, and calculate the interval.

90% confidence Xz−⋅αα22(σ/n≤µ≤X+z ⋅(σ/n) interval with increased n: 5.616 - 1.645 ⋅ (1.827/ 25 ) ≤ µ≤ 5.616 + 1.645 ⋅ (1.827/ 25 ) (n=25) 5.616 - 0.6 ≤ µ≤ 5.616 + 0.6

5.016 ≤ µ≤ 6.216

Interpretation: We can be 90% confident that the true mean is somewhere between 5.016 mg/kg and 6.216 mg/kg.

Notice that this interval is slightly narrower than the previous interval. To narrow an interval in this fashion is to increase its precision. Also notice that we have maintained the same level of confidence (at 90%) and have increased precision entirely by increasing sample size.

On the surface, it appears that the almost unnoticeable increase in precision came at the high cost of generating 10 additional samples. Unrealistically, we let X and s be the same for the two intervals. They will obviously be different for different sample sizes. This was done strictly to direct attention to the relationship between sample size and precision for a given level of confidence.

The t-distribution

Thus far, z has been the driving force for making probability assessments about an unknown population mean by way of a confidence interval.

Recall the strict definition of z. It is:

X-µ z = . σ/n

The characteristics of z are:

⋅ z has one shape, the standard normal; ⋅ z is centered at zero, and it has standard deviation equal to one; ⋅ As long as the true, population standard deviation ( σ ) is known or given, an X can be converted to z.

In practical situations, we simply do not have σ . We do, however, have its estimator, which is s. Now, return to the right-hand side of the formula for z. If we substitute the item that we have (s) for the item that we do not have (σ ) in the formula itself, the result is:

X − µ . s/ n 13 Appropriate questions now surface:

Q: Is this z anymore?

A: No, because it violates the strict definition of z.

Q: But I have been using z right along to make probability assessments. Do you mean to tell me that I should not have been using z to make these assessments?

A: I am telling you that, when you are in an applied situation and you must use a calculated standard deviation in place of the true, unknown standard deviation, the variable with which you are dealing is no longer distributed as z.

Q: Well, if it isn’t z, what then is it, and how do I use it?

A: It is a well-defined probability distribution. It is called the t-distribution, and it is used just like z. Its formula is:

X-µ t = . s/ n

Q: Is there a relationship between z and t?

A: Yes, there is. It can be appreciated in terms of the following argument. First, pay attention to the item in each formula that makes z and t different. We expect a sample statistic (like s) to look more like the population parameter it is designed to estimate ( σ ) the more we sample (as n increases). Comparing the formulas directly, t starts to look more and more like z as n increases. This happens because s becomes a better estimate of σ . Ultimately, as n increases s becomes σ , and t becomes z.

Appearance Of The t-distribution

The t-distribution has a number of shapes. It has a different shape for each sample size (n). When using t, sample size is converted to something called degrees of freedom (df). The relationship between n and df is df=n-1. This is read “Degrees of freedom is equal to sample size minus 1.” Thus, t likewise has a different shape for each df.

Pictures for three different t-distributions, representing samples of size 2, 20, and "large", are presented below. The t-distribution with n=2 has df=1; the t-distribution with n=20 has df=19; and the t-distribution with "large" n has "large" df. As sample size above increases beyond 30, the t-distribution approximates the normal distribution better and better.

Standard normal with “large” df

t with 19 df t with 1 df

t 0 14 From these pictures, we see that each t-curve is centered at zero. It is also symmetric about zero. Also, t is considerably flatter and more spread out than the z curve for small values of n; t effectively becomes z as n gets large.

Recall when we used z in the confidence interval formula that we attached a subscript on z to match the tail area associated with z. Thus, if the tail area was of magnitude α 2 , the z-value that corresponded with this tail area was denoted as zα 2 . We now do the same with t. We will focus on the upper tail area of t and refer to this tail area as α . The t-value corresponding to the tail area α will be referred to as tα .

The relationship between a tail area (α) and its corresponding t-value ( tα ) is presented in the picture below for our three t-curves. Notice the t-axis in the picture. With zero as the reference point on the t- axis, each tα presented on that axis ought to become larger as we move in the positive direction, i.e., as df decreases. The place to verify this is a statistical table constructed specifically for the t-distribution. Such a table is the t-table presented on page 17. In addition, notice that the shaded areas in the left portion of the t-curves below would have t-values that are simply the negatives of the tα -values in the right portion of the curve.

Standard normal with “large” df

t with 19 df

t with 1 df α α α

0 tα tα tα

Now, let’s examine how to use the t-table presented on page 17. First, notice that df is presented as the left-most column. It is reproduced as the right-most column to facilitate reading values from the table. Notice the top row of the table. The subscripts on the t-values presented represent the magnitudes of the tail areas. From left to right, these are 0.10, 0.05, 0.025, 0.01, and 0.005. The entries in each interior column of the table are the t-values for a given tail area (read from the top of the column where the entry appears) and for a given df (read from the row in which the entry appears). Finding a t-value for a given α and df simply involves finding the intersection of these row and column entries in the body of the table.

Example:

If n=2 and α =0.05, find df and tα .

Solution:

Since n=2, df=n-1=2-1=1. Since α =0.05, we are looking for t0.05 corresponding with df=1. To solve this problem, go down the df column and locate “1". Go across the top row and locate t0.05. The solution is the intersection of this row and this column in the body of the table; i.e., t0.05=6.314.

On your own, verify that, for α =0.05 and df=19, t0.05=1.729. For α =0.05 and df=1000, verify that t0.05=1.646.

15 Situation Where We Use t In Place Of z: Confidence Intervals

Let’s look at the first situation where t exhibits its practicality. Return to the 15 chromium samples for which we calculated a confidence interval for µ . Recall that, on the basis of 15 samples (n=15), X =5.616 mg/kg and s=1.827 mg/kg. At the same time, we assumed that σ =s=1.827 mg/kg, and we constructed a 90% confidence interval for the unknown population mean.

Recall the formula for a (1−α) confidence interval when using z:

X−⋅zαα22(/σn)≤µ≤X+z ⋅(/σn)

Since we do not have σ , we should replace σ and z with s and t, respectively, in the confidence interval formula. This yields:

Xt−⋅αα22(s/n)≤µ≤X+t ⋅(s/n)

Calculating the 90% confidence interval is straightforward once we find the proper values for + tα 2 . To do so, revisit the suggestions beginning at the bottom of page 10. Be aware that the (1−α) confidence interval formula requires us to find + tα 2 . Also notice that df=n-1=15-1=14 for this problem. (1) 1- α = 0.90 (2) α =1-0.90=0.10 (3) α 2 =0.10/2=0.05

(4) tα 2 =t0.05=1.761 for df=14 (See the t-table).

(5) - tα 2 =-t0.05= -1.761 for df=14.

Making the substitutions, we get:

Xt−⋅αα22(s/n)≤µ≤X+t ⋅(s/n)

5.616 −⋅1.761 (1.827 / 15) ≤µ≤5.616 +1.761⋅(1.827 / 15)

5.616 − 0.831≤µ≤5.616 +0.831

4.785 ≤ µ≤6.447

16 t-table

df t0.10 t0.05 t0.025 t0.01 t0.005 df 1 3.078 6.314 12.706 31.821 63.657 1 2 1.886 2.920 4.303 6.965 9.925 2 3 1.638 2.353 3.182 4.541 5.841 3 4 1.533 2.132 2.776 3.747 4.604 4

5 1.476 2.015 2.571 3.365 4.032 5 6 1.440 1.943 2.447 3.143 3.707 6 7 1.415 1.895 2.365 2.998 3.499 7 8 1.397 1.860 2.306 2.896 3.355 8 9 1.383 1.833 2.262 2.821 3.250 9

10 1.372 1.812 2.228 2.764 3.169 10 11 1.363 1.796 2.201 2.718 3.106 11 12 1.356 1.782 2.179 2.681 3.055 12 13 1.350 1.771 2.160 2.650 3.012 13 14 1.345 1.761 2.145 2.624 2.977 14

15 1.341 1.753 2.131 2.602 2.947 15 16 1.337 1.746 2.120 2.583 2.921 16 17 1.333 1.740 2.110 2.567 2.898 17 18 1.330 1.734 2.101 2.552 2.878 18 19 1.328 1.729 2.093 2.539 2.861 19

20 1.325 1.725 2.086 2.528 2.845 20 21 1.323 1.721 2.080 2.518 2.831 21 22 1.321 1.717 2.074 2.508 2.819 22 23 1.319 1.714 2.069 2.500 2.807 23 24 1.318 1.711 2.064 2.492 2.797 24

25 1.316 1.708 2.060 2.485 2.787 25 26 1.315 1.706 2.056 2.479 2.779 26 27 1.314 1.703 2.052 2.473 2.771 27 28 1.313 1.701 2.048 2.467 2.763 28 29 1.311 1.699 2.045 2.462 2.756 29

30 1.310 1.697 2.042 2.457 2.750 30 35 1.306 1.690 2.030 2.438 2.724 35 40 1.303 1.684 2.021 2.423 2.704 40 50 1.299 1.676 2.009 2.403 2.678 50 60 1.296 1.671 2.000 2.390 2.660 60

70 1.294 1.667 1.994 2.381 2.648 70 80 1.292 1.664 1.990 2.374 2.639 80 90 1.291 1.662 1.987 2.369 2.632 90 100 1.290 1.660 1.984 2.364 2.626 100 1000 1.282 1.646 1.962 2.330 2.581 1000

1.282 1.645 1.960 2.326 2.576 z0.10 z0.05 z0.025 z0.01 z0.005

17 A comparison of the intervals using z and t is as follows:

90% confidence interval using z: 4.840 ≤ µ≤6.392 90% confidence interval using t: 4.785 ≤ µ≤6.447

The interval using t is slightly wider than when using z for the same level of confidence. The item responsible for the increase in width is the t-value. (Compare t=1.761 to z=1.645).

Recall that a wider interval for the same level of confidence means a loss in precision. When comparing the limits of the two intervals above, the loss in precision when using t is negligible. This is due to the small value of s in the first place. When n itself is small or s itself is large or when we have a combination of the two, the differences in widths between the two intervals can be dramatic. This loss in precision is the penalty for not having the true standard deviation, only its estimate, in the interval. In fact, this penalty is more severe the smaller is n and less severe the larger is n. To see this, visit the t0.05 column in the t-table. Start with our entry (t0.05=1.761) and notice how its value increases as n decreases and how its value decreases as n increases. This former set of circumstances results in ever-widening intervals, the latter in ever-narrowing intervals. When n gets really large, t and z give virtually identical results.

An Upper One-Sided (1-α) Confidence Interval For µ

We have just seen that a two-sided (1-α) confidence interval for µ is:

Xt−⋅αα/2 (s/n)≤µ≤X + t/2 ⋅(s/n).

Frequently, environmental data are skewed to the right, and decisions need to be made about observations in this end of the distribution. Observations in the left end of the distribution are not the focal point of the test. Thus, it becomes relevant to construct an upper one-sided (1-α) confidence interval for µ. This is nothing more than a two-sided confidence interval specified only in terms of its upper limit (UL) and with the entire α area placed in the right tail (as opposed to α/2 in the two-sided situation). Thus, the upper one-sided (1-α) confidence interval for µ becomes:

0X≤µ≤ +tα ⋅(s/n).

It is apparent that the upper limit (UL) is:

UL = X + tα ⋅(s/ n) .

The only difference between the upper limits for one-sided and two-sided confidence intervals is the value of t: tα/2 is used in the upper limit for a two-sided confidence interval, and tα is used in the upper limit for an upper one-sided confidence interval.

Another Confidence Interval Example

State of Connecticut Regulation of Department of Environmental Protection (Page 16 of 66: (e) Applying the Direct Exposure and Pollutant Mobility Criteria)

18 “Unless an alternative method for determining compliance with a direct exposure criterion has been approved by the Commissioner in writing, compliance with a direct exposure criterion is achieved when (A) the ninety-five percent upper confidence level of the of all sample results of laboratory analyses of soil from the subject release area is equal to or less than such criterion, provided that the results of no single sample exceeds two times the applicable direct exposure or (B) the results of all laboratory analyses from the subject release area are equal to or less than the applicable direct exposure criterion.”

Taking Apart The Regulation

Important: Although the regulation is not specified in terms of constructing a two-sided confidence interval or an upper one-sided confidence interval, let's perform the calculations in terms of the latter.

Suppose we are focusing on arsenic levels, in mg/kg, for a particular site. Suppose that we take 20 samples (i.e., n=20). The observations are as follows:

10.20 4.17 1.92 17.80 6.34 1.55 15.60 3.10 7.81 6.90 2.06 4.72 5.73 14.10 9.18 7.78 4.66 7.63 4.28 10.40

Next, we calculate the sample mean (X = 7.30) , the sample standard deviation (s=4.53), and the standard error of the sample mean (sX =s / n =1.01). We use the t-table to get the appropriate t-value for the required upper 95% confidence interval. Consulting the t-table, we need t.05 for df=n-1=20-1=19. This t- value is 1.729.

Suppose, further, that the direct exposure criterion for arsenic is set at 10 mg/kg. Reading (A) and (B) of the regulation, we have the following possible situations: (1) From (B), if all samples are ≤ 10 mg/kg, compliance results. (2) From (A), if just one sample is ≥ 20 mg/kg, compliance fails. (3) From (A), if some samples are between 10 and 20 mg/kg, with some below 10 mg/kg, compliance may result.

Make a decision about compliance based upon comparing the one-sided 95% upper confidence limit of the arithmetic mean of all the samples with the direct exposure criterion. ⋅ If this upper confidence limit is≤ 10 mg/kg, compliance results. ⋅ If this upper confidence limit is >10 mg/kg, compliance fails.

Q: Which of the three situations do we have? A: Five samples are above 10 mg/kg, so we do not have Situation (1). No sample is above 20 mg/kg, so we do not have Situation 2. We have Situation 3 because five samples are between 10 and 20 mg/kg and 15 are below 10 mg/kg.

Since we have a situation of possible compliance, we proceed with calculating the upper limit (UL) for the upper one-sided 95% confidence interval

UL =+X tα ⋅(s / n) =7.30 +1.729(0.010) =9.046 mg/kg .

Since UL = 9.046 mg/kg and this value is less than the direct exposure criterion of 10 mg/kg, compliance has been achieved.

19

Summary And Words Of Caution When Using t Or z

All of the testing that we have done so far has focused on a particular sample statistic, the mean. We could likewise concentrate on other statistics like a , proportion, or a . The confidence intervals that we have constructed so far are based on normality of the sample mean. The Central Limit Theorem is the vehicle that guarantees the normality of the sample mean irrespective of the distribution of the observations upon which the sample mean is calculated. If the parent population itself is not highly skewed, normality of the mean kicks in at small sample sizes. If the parent is extremely skewed, normality kicks in around a sample size of 30. We have gone through a computer simulation to demonstrate the behavior of the sample mean based upon a population that was extremely skewed.

We have also made a distinction between when to use the z-distribution and when to use the t- distribution. We concluded that the more realistic of the two is the t-distribution because we never know the true population standard deviation. The best we can do is to estimate the population standard deviation. The t-distribution accommodates this estimate.

Procedures using t are based on the assumption that the observations come from a normal population. They work reasonably well when the observations are not normally distributed and the sample size is small (e.g., less than 15) or moderate (between 15 and 30), provided that the observations under consideration are not too far from being normally distributed. We say that these procedures are robust to violation of the normality assumption.

Treatment Of Outliers And Testing Suggestions

We must always be cautious of outliers, defined as observations that fall well outside the overall pattern of the data. An may be the result of error in recording or measurement; it may also be an unusual and extreme observation. Obviously, outliers call into question the normality assumption. Even when the sample size is large, outliers may also affect our test procedures because both the sample mean and sample standard deviation are not resistant to outliers, although less so than when the sample size is small. For small sample sizes in particular, these procedures are not robust to outliers. It is important to examine our data before applying test procedures. This will ensure that the test procedures will be appropriate.

If an outlier is present and it is not the result of recording or measurement error, several things can be done. As a preliminary, apply the procedure to the data set with and without the outlier(s). If the difference is substantial, take an alternative course of action. Two possibilities are suggested: (1) If the data can be shown to abide by the characteristics of a different probability distribution, e.g., log-normal, transform the data to this distribution and conduct the tests according to the parametric assumptions of this distribution; (2) disregard searching for the correct parametric distribution and implement the appropriate non-parametric procedure.

A Simple Approach For Assessing Data Distribution And The Possibility Of Outliers

A Data Set’s Five-Number Summary And Box-And-Whisker Diagram (Or Boxplot)

A straightforward method for seeing how sample data are distributed begins with computing their . Furthermore, performing an additional calculation between the first and third quartiles (to obtain the ) assists with determining whether an observation might be an outlier.

20 Quartiles are numbers that divide the data set into four equal parts, i.e., quarters. In order to calculate quartiles, the observations must first be ordered by size, from smallest to largest. A data set has three quartiles, denoted as Q1, Q2, and Q3. The first Q1 is the number that divides the bottom 25% of the data from the top 75%. The second quartile Q2 is the number that divides the bottom 50% of the data from the top 50%. This is also the median. Notice that the median is a measure of the middle of the ordered data set. The third quartile Q3 is the number that divides the bottom 75% of the data from the top 25%. By definition, quartiles depend strictly on observation order. They are not sensitive to observation size. The interquartile range (or IQR) is defined as the difference between the first and third quartiles; i.e., IQR = Q3 – Q1. It gives the range of the middle 50% of the data and is the preferred measure of spread when the median is used as the measure of center.

Obtaining quartiles is quite easy. Most importantly, begin by ordering the data. Then, obtain the median. At this point, the data are divided into two parts: a lower 50% and an upper 50%. Next, find the median of the lower 50%; this will be Q1. Finally, find the median of the upper 50%; this will be Q3. The resulting three numbers divide the data set into four parts that each contains 25% of the data.

Pictures of four common continuous distributions are presented in the top panels of the figure below. Quartiles are plotted on the horizontal axis of each parent distribution. The upper-left top panel shows that distance between pairs of Qis (and also between the minimum value and Q1 and between Q3 and the maximum value) is the same. This provides information that the distribution is uniform. Visual inspection of the remaining top panels suggests a different distribution as these horizontal segments change in size. In particular, notice that the shorter the horizontal segment, the taller the respective portion of the continuous distribution.

Four Parent Populations And Accompanying Box-And-Whisker Diagrams (Or Boxplots)

Famous Princeton University statistics professor John Tukey took this concept and applied it to sample data. His creation was the box-and-whisker diagram, or boxplot. It makes use of a data set’s minimum, Q1, Q2, Q3, and maximum (also called a data set’s five-number summary) to provide a graphical display of 21 the center (i.e., the median) and variation in a data set. Boxplots corresponding to the four different parent distributions in the pictures above are presented directly beneath each continuous distribution.

In each boxplot, the vertical line that divides the overall rectangle into two boxes represents the center of the ordered data set. This vertical line is the median or Q2. Notice that the quartiles establish the horizontal length of the boxes. In addition, a horizontal line connecting Q1 with the minimum value and a horizontal line connecting Q3 with the maximum value result in the whiskers. Horizontal distances of the whiskers and the boxes provide a preliminary idea of the distribution of the data. In particular, for a given boxplot, notice the horizontal lengths of the segments. The shorter the horizontal segment in the boxplot, the taller is the respective portion of the continuous distribution.

Upon constructing a boxplot for a given body of data, it is common to use the results to get a preliminary idea about the data’s distribution. The figure's boxplots provide insight for the possible patterns that a data set may have.

Interquartile Range (IQR) And Outliers

Quartiles and IQR can be used together to develop a general rule that is useful for identifying potential outliers. First define the following lower limit and upper limit:

Lower limit = Q11 − .5⋅IQR

Upper limit = Q13 + .5⋅IQR.

The benchmark rule is as follows: an observation that lies1.5 IQRs below the first quartile or 1.5 IQRs above the third quartile is a potential outlier. If this happens, further data analysis should be done to determine the reason, if possible.

Examples

So far, we have three bodies of data: 15 lead concentration levels; 15 chromium concentration levels; and 20 arsenic concentration levels. All are expressed in mg/kg. The 15 lead concentration levels were presented on page 1; the 15 chromium concentration levels were presented on page 11; and 20 arsenic levels were presented on page 19. Each ordered data set, a comment about its size, and its five-number summary follow.

Lead (rounded to one decimal): 17.1 21.4 23.3 23.4 24.7 25.6 26.9 26.9 27.3 29.2 33.1 37.3 44.9 49.8 58.8 n = 15 (Sample size is considered moderate). minimum = 17.15 Q1 = 23.41 Q2 = 26.95 Q3 = 37.28 maximum = 58.77

Chromium (rounded to one decimal): 2.6 3.4 3.5 4.1 4.7 5.3 5.4 5.5 5.9 6.0 6.4 6.5 6.8 8.5 9.4 n = 15 (Sample size is considered moderate). minimum = 2.643 Q1 = 4.147 Q2 = 5.513 Q3 = 6.525 maximum = 9.412

Arsenic (rounded to two decimals): n = 20 (Sample size is considered moderate). 1.55 1.92 2.06 3.10 4.17 4.28 4.66 4.72 5.73 6.34 6.90 7.63 7.78 7.81 9.18 10.20 10.40 14.10 15.60 17.80 minimum = 1.55 Q1 = 4.20 Q2 = 6.62 Q3 = 9.95 maximum = 17.80

22

We now calculate the Upper Limit for each data set. We do not bother with the Lower Limit because anything below this limit is not a problem.

Heavy Metal Upper Limit = Q3 +1.5⋅IQR Lead 37.28 + 1.5(37.28-23.41) = 58.09 Chromium 6.525 + 1.5(6.525-4.147) = 10.09 Arsenic 9.95 + 1.5(9.95 - 4.20) = 18.58

None of the observations for any of the data sets exceeds its Upper Limit. Strictly, outliers are not present.

The computer program Minitab was applied to all three data sets to glean information about each data set’s distribution and about the possibility of outliers. The boxplot for lead revealed a pattern consistent with a right-skewed distribution. Boxplots for chromium and arsenic favored a normal appearance. Minitab likewise flagged the last lead observation as an outlier with no outliers detected for the other two data sets. For all three data sets, each maximum was close to its IQR.

Note that confidence intervals were previously constructed using the chromium and arsenic data. The boxplots were constructed for these data sets prior to the analysis. An important parametric (i.e., distri- butional) assumption when using z or t is that the random variable under consideration (e.g., X in this case) be normally distributed. The boxplots resulted in a normal appearance, suggesting validity to the approach taken. A formal test for normality is the Shapiro-Wilk test. Since it involves hypothesis testing, it is not presented unless hypothesis testing is explained.

Hypothesis Testing: The Classical Approach (Test Of One Mean)

Hypothesis testing is important because of its contribution to the decision making process. It is complementary to confidence interval construction. We explain the classical approach to hypothesis testing in terms of five steps. Each step will be explained in full with an example before proceeding to the next step. After the classical approach, we explain the p-value approach to hypothesis testing.

Step 1: State the null and alternative hypotheses.

A hypothesis is simply a statement that something is true. Effectively, hypothesis testing involves making a choice between two competing states of nature. The hypotheses are designed to be mathematical opposites. Each hypothesis purports to represent the way things really are, i.e., the true state of nature.

Unfortunately, we never know which state of nature is the true state of nature. All that we can do is to offer evidence supporting one state of nature or the other. Our evidence comes in the form of a statistical calculation performed on a body of data. Once we have statistical evidence, we are in a position to support one hypothesis or the other.

The situation that best promotes an understanding of hypothesis testing without a lot of mathematics is a jury trial. The competing states of nature are the prosecution’s allegation that the defendant is guilty and the defense’s allegation that the defendant is innocent. (Notice that these are opposite states of nature). The jury listens to the (statistical) evidence and decides, i.e., infers, which state of nature is appropriate based upon the evidence. Upon digesting the evidence, the chairperson of the jury reports that the defendant is either guilty or not guilty.

23 In hypothesis testing, the two competing states of nature are represented by a null hypothesis and an alternative hypothesis.

Null hypothesis: H0 ⋅ The proposition being challenged

⋅ Expressed in terms of a specific value of a population parameter: µ0 = .

Alternative Hypothesis: Ha ⋅ The opposite of (or alternative to) the null ⋅ Expressed in terms of one or several values of the population parameter, different from the

value given to µ0 : µa = ; µa < ; µa > ; µa ≠ .

Example: You are at a wetland site. You have been instructed to test for chromium contamination. Before doing any testing, you realize that the two competing states of nature are:

State of nature 1: soil is not contaminated State of nature 2: soil is contaminated.

You desire to come up with values for µ that characterize each state of nature. Suppose that a regulatory agency has determined that a population mean chromium concentration equal to 7 mg/kg characterizes the no-contamination state. Likewise, a population mean chromium concentration significantly greater than 7 mg/kg characterizes the contamination state. Thus, in terms of values for µ, the hypotheses are:

H0: µ0 = 7 mg/kg

Ha: µa > 7 mg/kg

Pictures that characterize the sampling distribution of sample means for each state of nature are presented below. Notice that the picture of the sampling distribution representing the alternative hypothesis is positioned to the right of the picture representing the null hypothesis. Simply comparing the centers of the two provides a reason for this occurrence.

Sampling Distribution of X

Under the Assumption That H0 Is True

X µ=0 7 Sampling Distribution of X Under the Assumption That H Is True a

X µ7a >

24 Step 2: Before gathering evidence, decide upon a tail probability (α) associated with H0 true. This tail probability is your admission that an eventual X in this tail is so remote from

µ0 that X does not support this hypothesis as true.

(a) α is referred to as the “level of significance”.

(b) It is a tail-area probability, typically one of five common magnitudes: 0.001, 0.01, 0.025, 0.05, or 0.10. (In this example, let α = 0.05.)

(c) α is placed in that tail of H0 which would intersect a corresponding tail of Ha if the pictures of the two sampling distributions were superimposed on each other.

(d) α reflects the notion that a value of X (i.e., your eventual sample result) in this region is so

distant from µ0 that the “home” for your X must be the other state of nature.

(e) α always goes with the picture corresponding with “H0 true.”

Example: α = 0.05 is placed in the upper tail of the “H0 true” sampling distribution.

Sampling Distribution of X

Under the Assumption That H0 Is True

α =0.05

X µ=0 7

Step 3: Establish a decision rule to assist in choosing between hypotheses.

(a) α is used to derive a tα . (Since α = 0.05, tα =t0.05. We will get a specific value for tα from the t- table once we know the degrees of freedom).

(b) tα is a point of demarcation used for making a decision between the competing hypotheses. The two possible decisions are stated in terms of H0. The proper words for these two possible decisions are: "Fail to Reject H0" and "Reject H0".

(i) To one side of tα , we will go with H0. In this example, fail to reject H0 if our eventual X converted to t falls to the left of t0.05.

(ii) To the other side of tα , we will not go with H0; i.e., we will reject H0. In this example,

reject H0 if our eventual X converted to t falls to the right of t0.05.)

(iii) tα is referred to as the “critical value.”

25

X Ht0 rue

α =0.05

X µ=7 0

Decision Fail to Reject H Reject H Rule 0 0 t t 0.05

Example: We know that a sample result to the left of tα will lend support to the null hypothesis. A

sample result to the right of tα lends support to the alternative hypothesis. Notice that we have set up this decision rule before peeking at any data!

Step 4: Generate your samples. Using these data, calculate X and s. Convert X to t. Place t on the decision rule line.

(i) Site data were generated within budget. Suppose that nine samples were taken. (These chromium measurements were taken from a different location than the 15 previously used.) Results, in mg/kg, are:

10.1548 17.8599 10.2117 13.0761 17.0871 10.9996 13.5646 14.0418 13.1423

(ii) From the data, we calculate X =13.349 mg/kg and s=2.751 mg/kg. The question is: Is our

X sufficiently far from µ0 =7 mg/kg to warrant rejection of H0?

(iii) To answer the question in (ii), convert Xto t.

X −µ 13.349 −7.00 t ==0 =6.923 s / n 2.751/ 9

This sample result (X) is approximately seven standard deviations to the right of the hypothesized mean of 7 mg/kg. (Wow! that’s really far!)

(iv) In order to place t on the decision rule line, we must first get tα . Having generated our samples, we know that n=9. Thus, df=n-1=9-1=8. Consulting the t-table on page 17, we

see that, for df=8 and α =0.05, t0.05=1.860. Our sample result (X), converted to t=6.923, can now be placed on the decision-rule line.

Example: The results are presented as follows:

26

Htrue 0

α =0.05

X µ=0 7

Decision Fail to Reject H0 Reject H0 t Rule t1= .860 0.05 t6= .923

Step 5: Apply the decision rule by comparing t and tα . Make a decision. State your conclusion in words.

(i) Apply the decision rule by comparing t and tα . Notice that t=6.923 lies far to the right

of t.05 =1.860. That is, t> tα .

(ii) Make a decision. Since t> tα , reject H0. (Rejecting H0 means that we cannot support the mean chromium concentration level to be 7 mg/kg).

(iii) State the conclusion in words. Evidence is sufficient to show that the mean chromium concentration level is not 7 mg/kg. Evidence suggests that it is some level significantly greater than 7 mg/kg.

The P-Value Approach To Hypothesis Testing

Revisit the diagram in Step 4 of the classical approach to hypothesis testing. Notice the one-to-one correspondence between the level of significance (α = 0.05) in the picture of the sampling distribution under H0 true and the critical value (t0.05 = 1.860) depicted on the decision- rule line. Notice, further, that each of these has a vertical bar; also, the vertical bar of one lines up precisely with the vertical bar of the other. When it comes to α and tα, one implies the other, so they ought to line up pictorially in this fashion. Specifically, given α, I will be able to find tα. Likewise, given tα, I will be able to find α.

Now, consider the other t-value presented on the decision-rule line. This represents an X value from the sampling distribution picture converted to t by way of the t formula. Given what we said above, this suggests that this calculated t-value must have a corresponding probability value from the sampling distribution picture. The probability value associated with this calculated t-value (or test statistic) goes by the name p-value.

Locating the calculated t-value on the decision rule line and moving upward to the sampling distribution picture, it appears that there is no p-value that corresponds with the calculated t-value. Take note, however, that the tails of the sampling distribution never touch the horizontal axis; rather, they are asymptotic to the horizontal axis. There is a p-value in this case, but it is extremely small; i.e., it is in the vicinity of 0.00001 and is typically written p < 0.00001.

The attractiveness of the p-value is in its interpretation. It indicates how likely observing the particular value of our test statistic would be if, in fact, the null hypothesis were true. Small p-values provide

27 evidence against the null hypothesis; larger p-values do not provide evidence against the null hypothesis. The closer that the p-value is to zero, the stronger is the evidence against the null hypothesis. What is a small versus a large p-value is frequently put into perspective by comparing the p-value to a chosen level of significance. Once this is done, decisions regarding a hypothesis test are made just like they were when using the classical approach. That is, reject the null hypothesis when the p-value is smaller than the level of significance. Do not reject the null hypothesis when the p-value is larger than the level of significance.

For our hypothesis test example using the nine chromium observations, we stated that the p-value corresponding to the test statistic (t = 6.923) is < 0.00001. Since this value is so small, we have strong evidence against the null hypothesis. Said another way, if the null hypothesis of no contamination were, in fact, the true state of nature for the situation that we are testing, the result that we actually observe, X= 13.349 mg/kg and translated to t = 6.923, is extremely unlikely. This result definitely does not support the null hypothesis.

Complementarity Between Hypothesis Testing And Confidence Interval Construction

Suppose that the direct exposure criterion for chromium is set at 14 mg/kg. (This level is chosen for illustrative purposes). To get an idea of how the regulation makes use of an upper one-sided 95% confidence interval (similar to the one constructed for the arsenic example) to reach a conclusion about compliance, we begin with the chromium results: X =13.349, s=2.751, n=9, df=8, and t.05 = 1.860. Next, we use the formula for the upper limit (UL) of an upper one-sided confidence interval:

UL =+X tα (s / n)

Making the appropriate substitutions, we have:

UL = 13.349 + 1.860 (2.751/ 9 ) = 15.05 .

Since UL = 15.05 mg/kg and this value exceeds the direct exposure criterion of 14 mg/kg, compliance has not been achieved.

Notice that the conclusion using the confidence interval (i.e., compliance has not been achieved) is consistent with the results of the hypothesis test (i.e., Evidence supports the contamination state of nature.).

Testing For Normality: The Shapiro-Wilk Test

One of the most powerful statistical tests for normality is the W-test developed by Shapiro and Wilk. The mathematics involved with deriving the test statistic is quite involved. Basically, the test works as follows. First, the observations are arranged in order, from smallest to largest value. Next, a probability is calculated for each observation under the assumption that it comes from a normal distribution. This is done by first computing the mean and standard deviation for the data set. Using these two measures and each data value, a z-score is then calculated. These z-scores are identified with probabilities under the normal curve. The probabilities are then plotted against the data values. Since there is curvature in the normal curve, the probabilities are converted to a log scale prior to plotting.

The null hypothesis for the test is that the data have a normal distribution. The alternative hypothesis is that the data do not have a normal distribution. If the data follow a normal distribution, we would expect that the data values match the probabilities assumed under normality. This suggests that there should be a 28 high correlation (i.e., a high degree of linear association) between the scaled probabilities and the original observations.

The W-test for normality involves the calculation of a correlation coefficient (R). Given the null and alternative hypotheses, the correlation coefficient is transformed to the W- statistic. A test of significance is performed. Computer packages like Minitab typically report the correlation coefficient and the p-value for the test.

Normality tests were performed on the four heavy metals. Normality was rejected for the lead (n = 15) data but not for the chromium (n = 15) data, not for the chromium (n = 9) data, and not for the arsenic (n = 20)data. P-values for the specific data sets are reported below. Recall that small p-values lead to rejection of the null hypothesis of normally distributed data.

Heavy Metal P-Value Test Result Lead (n = 15) 0.0339 Reject Normality Chromium (n = 15) >0.1000 Do Not Reject Normality Chromium (n = 9) >0.1000 Do Not Reject Normality Arsenic (n = 20) >0.1000 Do Not Reject Normality

The following four figures are Minitab's way of doing the W-test. Notice that p-values presented in the lower right-hand corner of each figure are summarized in the table above.

29

30 Hypothesis Testing: Comparison Between Two Means

Beyond testing one population mean, as in the previous example, the five-step hypothesis-testing procedure can be adapted to a variety of situations. It is particularly useful for making comparisons.

An example is the comparison between site and background data. Here we make use of the previous site data (n=9). It is to be compared against background data (n=6). The data are presented as follows (in mg/kg): Site Background 10.1548 6.3252 17.8599 7.6762 10.2117 8.0639 13.0761 3.6461 17.0871 6.8746 10.9996 9.7481 13.5646 14.0418 13.1423

The appropriate issue is whether or not site concentrations of chromium are significantly higher than background concentrations. From an inferential statistics perspective, the issue is whether the data sets come from the same population or from different populations. Alternatively, this is equivalent to asking if there is a difference between the population means of the two bodies of data.

Expanding these statements with notation, if the site concentrations (S) and the background concentrations (B) are the same, then:

(1) S and B come from the same population, or

(2) µ=SµB (the means of the two populations are the same) or

(3) subtracting µB from each side, µ−SµB=0; i.e., the difference in means is zero.

If the site concentrations (S) are greater than the background concentrations (B), then

(4) S and B come from different populations, or

(5) µ>SµB (the mean of S exceeds the mean of B) or

(6) subtracting µB from each side, µ−SµB>0; i.e, the difference in means is positive.

Notice that any one of (1)-(3) represents one state of nature and any of (4)-(6) represents the competing state. When making inferences between these two states of nature, we set up the hypotheses in terms of the difference between two means. Hence, we formulate (3) as the null hypothesis and (6) as the alternative hypothesis. We make a judgment about which state of nature to support by looking at the behavior of the sampling distribution of the difference between the two sample means; i.e., we look at the behavior of XXSB− .

31 As indicated previously, the five steps to hypothesis testing are the same when testing one mean or two means. The only distinction relates to the form of the test statistic, i.e., t, in each situation. Recall, when testing one mean, we needed to convert X to t in order to apply the decision rule. Also recall the relationship between X and t:

X −µ t= 0 . s/ n

When testing for the difference between two means, we must consider not one X , but the hypothesized relationship between two Xs . We likewise must convert this difference XS− XB to t in order to apply the decision rule. The relationship between XS− XB and t is

(X −−X ) (µ−µ) t= SB SB . ss2 2 SB+ nnSB

This form of t may seem bizarre, but it abides by the same logic as t for the one sample case. Specifically, the numerator addresses the degree to which the difference between sample means (i.e., XSB− X) deviates from the hypothesized difference between population means (i.e., µS-µB) when H0 is true. That is, we are looking at the degree to which XS− XB deviates from zero (where zero is the value of

µ−SµBwhen H0 is true). This numerator is judged large or small relative to a measure of the standard error of the two data sets combined. This combined standard error is the item appearing in the denominator of the t statistic. There is a very complicated formula for df when implementing this test. That complicated formula is not presented here. The following one works just about as well:

df = (nS-1) + (nB-1) = nS + nB - 2 .

We now carry through the example to test for the difference in two means.

Example: You are at a wetland site. You have been instructed to test for chromium contamination. To conduct a proper test, you gather site and background samples as presented on page 31.

Summary statistics generated for each data set are as follows:

Site Background

XS = 13.349 XB = 7.056 SS = 2.751 SB = 2.042 nS = 9 nB = 6

Determine, at the 0.025 level of significance, whether the site observations have a significantly higher mean level of chromium concentration than the background observations. (Note: Previously, we have not provided evidence that chromium concentration levels are not normally distributed).

32

This is a situation where we are testing for the difference in two means. The steps are as follows:

Step 1: State the null and alternative hypotheses.

H0: µ−SµB= 0 (There is no difference in means.)

HA: µ−SµB> 0 (The difference in means is positive.) Pictures of the sampling distribution are as follows:

X H0 true

X-X µµ− =0 SB SB

Htrue a

X-SBX µµ− > 0 SB

Step 2: Decide upon a level of significance: α .

Ht0 rue α = 0.025

X-SBX µ-SBµ =0

Recall that α reflects then notion that a value of XS− XB (i.e., your eventual sample result) in this region is so distant from µ−SµB= 0 that the “home” for your XS− XB must be the other state of nature. The problem states that α=0.025.

Step 3: Establish a decision rule to assist in choosing between hypotheses.

X

α = 0.025

X-X µ-µ =0 SB SB

Decision Fail to Reject H0 Reject H0 t Rule t= 2.160 0.025

33 Since α =0.025, tα = t0.025. We can find t0.025 once we calculate df for this problem. So, df = nS - 1 + nB - 1 = nS + nB - 2 = 9 + 6 - 2 = 13. Consulting the t-table, t0.025 for df=13 is 2.160.

Our sample result XS− XB will eventually be converted to t. If this t ≥ t0.025, we reject H0. If t < t0.025, we fail to reject H0.

Step 4: Convert sample statistics to t. Place t on the decision rule line.

(X −µX ) - ( - µ) (13.349 - 7.056) - 0 6.293 t = SB SB = = = 5.08 . ss2 2 (2.751)22(2.042) 1.5359 SB + + nnSB 96

Decision Fail to Reject H0 Reject H0 t Rule t20.025 = .160 t5= .08

Step 5: Apply the decision rule by comparing t and t α . Make a decision. State your conclusion in words.

Since t > t0.025, we reject H0. Evidence suggests that the difference in means is positive; that is, the mean concentration of chromium at the site locations is significantly greater than at background.

Incorrect Decisions In Hypothesis Testing

Hypothesis testing is the state of the art when it comes to utilizing statistical information for decision making purposes. The results of a hypothesis test are, however, never infallible. Keep the following in mind when conducting a hypothesis test. We never know which state of nature is true. On top of this, we use imperfect and incomplete information to make an inference about the unknown state of nature. Better information should promote a correct decision. But even with the best of information, we run the risk of making an incorrect decision.

First, let’s return to our nonmathematical jury example to illustrate the decisions that can be made. These go beyond simply finding the defendant guilty or not guilty. There are, in fact, four decisions that can be made in a jury trial. Two of them are correct, and two of them are incorrect.

The jury can find the defendant: ⋅ guilty, when in fact the defendant is guilty ⋅ not guilty, when in fact the defendant is guilty ⋅ guilty, when in fact the defendant is not guilty ⋅ not guilty, when in fact the defendant is not guilty.

First, review these in your mind, and you will determine that these really are four different decisions. Furthermore, two of these are correct decisions, and two are incorrect. The first and last decisions are correct; the second and third decisions are incorrect. Now, think about the consequences of the two

34 incorrect decisions. Each one represents large personal and societal costs. Yet, these incorrect decisions are inescapable. All that we can try to do is to minimize their probability of occurrence.

Relate the above example to the test of the difference in two means that was previously presented. Another way of stating the null hypothesis is that the soil is not contaminated. Another way of stating the alternative hypothesis is that the soil is contaminated. The four possible decisions that can be made are as follows:

The Environmental Professional can find the soil to be: ⋅ contaminated, when in fact the soil is contaminated ⋅ not contaminated, when in fact the soil is contaminated ⋅ contaminated, when in fact the soil is not contaminated ⋅ not contaminated, when in fact the soil is not contaminated.

As in the previous example, the second and third decisions are incorrect. Each one has a hefty cost associated with it. Decision two results in a heavy societal cost and an environmental cost that is perpetuated as a result of the area not being cleaned up when it should have been. In addition, word eventually gets out that you made a mistake, and the firm you represent gets a bad reputation. Decision three results in unnecessary clean-up costs and a fear contagion that something is wrong with the area when in fact nothing is wrong.

These incorrect decisions are exacerbated by trying to discover the limits of contamination before this study begins. For example, placing a sample into a “hot spot” camp when it really doesn’t belong there can really change your analysis and the conclusion of your hypothesis test.

Again, there is no way to eliminate completely the possibility of these incorrect decisions. All that we can do is to try to minimize the probability of their occurrence. If we are in the unfortunate situation of being forced to choose between incorrect decisions, we should choose that incorrect decision which has the least drastic consequences.

All of these issues can be illustrated with the pictures that we developed for our hypothesis test problems.

We begin by generalizing the decisions that the jury and the Environmental Professional can make. For any hypothesis test, we can: ⋅ fail to reject the null hypothesis, when in fact the null hypothesis is true ⋅ reject the null hypothesis, when in fact the null hypothesis is true ⋅ fail to reject the null hypothesis, when in fact the alternative is true ⋅ reject the null hypothesis, when in fact the alternative is true.

Notice that the first and last decisions are correct; the second and third decisions are incorrect.

You may be wondering why we choose these specific phrases (before and after the word “when”) in stating the four possible decisions above. The reason is that these are the identical phrases presented in the pictorial representation of both the competing states of nature and the decision rule in our previous hypothesis test situations.

35 For example, previous pictures had this appearance:

X Ht0 rue α

Decision Fail to Reject H0 Reject H0 t Rule tα

X Hta rue

Clearly, the following phrases are presented in the picture above: fail to reject H0; reject H0; H0 true, and Ha true.

We now can relate these phrases by focusing on α in the picture. First, notice that α is associated with both a particular state of nature (i.e., the state of nature that H0 is true) and a particular course of action (i.e., α instructs us to reject H0).

Second, notice that α is an area or a probability. In fact, it is the probability of rejecting H0 when H0 is true. Notice what we have just done. We have assigned a probability to one of the four specific decisions that we can make. Truly, this is the spirit of risk assessment!

If we have identified the probability associated with one of the four decisions, surely we must be able to do the same for the remaining three decisions. This becomes possible by providing symbols to the three remaining areas left unlabelled in the previous picture. It is reproduced below. Unlabelled areas are now labelled as 1-α, β, and 1-β.

Htrue 0 α 1-α

Decision Fail to Reject H0 Reject H0 t Rule

X Htrue a β 1-β

36 Regarding the top normal curve, if α is a tail area, the remaining area must be 1-α (because the total area under the curve must be 1). Associating 1-α with a specific state of nature and a course of action, 1-α is the probability of failing to reject H0 when H0 is true. Thus, we have assigned a probability to the first possible decision. (Notice that this is a probability attached to a correct decision).

The next probability is labeled β . Through similar reasoning, β is the probability of failing to reject H0 when Ha is true. (This is a probability attached to the other wrong decision that can be made). Finally, if β is a tail area associated with the bottom normal curve, the remaining area of that curve must be 1-β.

Associating 1-β with a specific state of nature and a course of action, 1-β is the probability of rejecting H0 when Ha is true. (This is a probability attached to the second correct decision). We summarize the decisions, their correctness, and their probabilities as follows:

Was this the correct thing Decision to do? Probability Fail to reject H0 when H0 is true yes 1-α Reject H0 when H0 is true no α Fail to reject H0 when Ha is true no β Reject H0 when Ha is true yes 1-β

Several of the items in the table above have common statistical names. They are:

Item Name

Reject H0 when H0 is true Type I error Fail to reject H0 when Ha is true Type II error α Probability of a Type I error β Probability of a Type II error 1-β ; the probability of not making a Type II error

It is important for a risk assessor to evaluate the magnitude of the probabilities presented above. The best of all worlds is to have 1-α and 1-β be as large as possible and to make α and β as small as possible. Unfortunately, manipulating one of these items affects the remaining three. Consequently, before we proceed with calculating either the probability of a Type II error or the power of a test, we will demonstrate the tradeoffs resulting from these manipulations. A firm understanding of the consequences of these manipulations sets the stage for appreciating any calculation we eventually make.

We begin with that probability over which the decision maker has control. This item is α. What does it mean to say that a decision maker has control over α?

The easiest way to understand this is to consider that a regulatory agency can set a regulatory threshold (RT) that effectively is a point of demarcation between concluding that a site is or is not contaminated.

Suppose this point is in terms of a mean, and it is denoted as XRT (in the one-sample case) or as XRT,DIF

(in the two-sample case). We will proceed with using the XRT notation. Similar logic applies to XRT,DIF .

37 Now, let XRT be introduced to the following customary picture:

Ht0 rue α

No Contamination Contamination t Decision tα Rule No Contamination Contamination X XRT

Notice that the regulatory agency’s decision rule line is comparable to the decision rule line that we formulated as Step 3 in hypothesis testing. Now, however, the decision rule is specified in terms of

X rather than t. For this situation, we notice that XRT and tα line up vertically. This means that tα is nothing more than a transformation of XRT . Specifically, since

X - µ t = , s/ n it follows that

X - µ t = RT 0 . α s/ n

Importantly, we can calculate tα once we have: (1) XRT ; (2) a value of µ under the null hypothesis true, i.e., µ0; (3) our calculated standard deviation s, and; (4) n.

Suppose that we follow this procedure using our nine samples and calculate tα to be 1.862. (We use a hypothetical value XRT =8.707 to make things "work.") What do we do with tα ?

We know that tα is associated with a tail area α. Once we know tα, we can find α if we are given the degrees of freedom. Next, we use the t-table in reverse. We locate df = n - 1 = 9 - 1 = 8 under the df column of the t-table. Locating “8" we go across until we find a number that is close to 1.862 (our tα value). We see that the number matching this is 1.860 in the body of the table. We go up the column identified with the 1.860 entry and see that the title is t0.05. Thus, tα = 1.860 = t0.05. This means that α = 0.05. Thus, the regulatory threshold is identified with an α of 0.05. This is the meaning of the decision maker having control over α.

The next question ought to be:

Q: If α is the probability of making an incorrect decision, and if I have control over α, doesn’t it pay for me to make its value even smaller? By doing so, I not only lower α, but I also increase 1-α, which is the probability of a correct decision!

38 A: If the regulatory agency is iron-fisted and sets XRT itself, which translates into α being fixed, you have no leeway for manipulating α. But, realistically, each situation is unique, and you can

exercise control over XRT , which means you can adjust α.

Now if you do make α smaller, you get yourself into the following dilemma. Notice the picture below. It is our familiar set-up for the two states of nature. The decision-rule line is purposely omitted.

Ht0 rue α

Hta rue β

α is initially set at a level consistent with the solid vertical line. Notice that setting  at this level sets β according to this same vertical constraint. Now, decrease α as suggested above. This translates into shifting the solid vertical line rightward. This shift is represented by the broken vertical line. Notice the new magnitude of β. It becomes larger as a result of this move.

A new question arises.

Q: Are there negative consequences for making α smaller?

A: Yes, by making α smaller, we make β larger. That is, by making the probability of a Type I error smaller, we make the probability of a Type II error larger. Not only this, but we also reduce 1-β, which itself is the probability of another correct decision and is referred to as the power of the test. Thus, the power of the test is reduced.

Hence, there is no free lunch for manipulating α in this fashion! There are always negative consequences.

Q: Isn’t there anything I can do to lessen α and β?

A: Yes, there is. But it will cost you. Specifically, recall the workings of the Central Limit Theorem. Suppose you change the number of samples that you take from “small” to “large”. What will happen to the pictures of the sampling distributions on the previous page? They will become more peaked around their means and less variable (i.e., have smaller tails) as indicated below. Smaller tails mean α and β are lessened. This is reasonable, because the more information we have, the more informed our decisions will be. With more information, we begin to avoid making incorrect decisions. The power of our test increases.

39

H=0 true For small sample, α = 0.0475

For large sample, α = 0.0087

For small sample, β = 0.1056 H=true a For large sample, β = 0.0367

Final Question:

Q: Due to budgetary reasons, suppose I cannot increase the number of samples I take. Also, suppose I exercise control over α. What should I do?

A: Recall that there is an inverse relationship between α and β. Both α and β are bad things, but you are able to judge which one is worse than the other. In this situation, there is no way to avoid α and β. Thus, you should minimize the level of that error which is the more detrimental of the two.

A Calculation For β and 1-β

We now calculate β, the probability of a Type II error, or the probability of failing to reject H0 when Ha is true. This translates into concluding that a site is not contaminated when in fact it is.

We will perform the calculation in the context of our two-sample test. The first thing we will do is determine the items needed to perform the calculations. This is accomplished by observing where β is located in the pictorial framework of a hypothesis test problem.

On page 36 we see that β is identified with the sampling distribution of the test statistic under the assumption that the alternative hypothesis is true. That situation is presented again in the picture on page 41.

40

Ht0 rue 1-α α =0.025

X-X µ-µ=0 SB SB

Decision Fail to Reject H0 Reject H0 Rule t t0.025 = 2.160

Hta rue

β 1-β

X-SBX µ-SBµ>0

We illustrate the mechanics of calculating β with the following steps:

(1) We find that value of XS− XB that matches up with α=0.025. Notice that the value of tα

associated with α=0.025 is 2.160. Also, the value of µS−µB under H0 true is zero. The pooled standard error is 1.5359 = 1.239 (from page 34). Inserting these values into the formula for t, we get

(X − X ) − 0 2.160 = SB 1.239

which, after rearranging, yields XS− XB = (2.160)(1.239) = 2.676.

(2) This value of XS− XB is the same one represented by the vertical bar separating β and 1-β in the bottom part of the picture that is presented above. To calculate β, all we need is a value of

µ−SµB postulated by the alternative hypothesis. So, the choice is ours.

(3) Select µ−SµB = 7. The t-value corresponding with XS− XB = 2.676, µS−µB = 7, and a pooled standard error of 1.239 is

2.676-7 t = = -3.49 . 1.239

(4) We can find a probability that gets paired with this t-value by consulting the t-table. Locate 13 degrees of freedom; then go across the row that corresponds with this entry. Find that value of t that is closest to the absolute value of -3.49. The closest number to 3.49 is 3.012, and this is associated with a probability of 0.005. Our result, then, is associated with a probability less than 0.005.

(5) Thus, the probability of a Type II error, or β, is less than 0.005 for a µS−µB of 7. The power of the test, or 1-β, is greater than 0.995.

41 (6) A power curve can be developed using different values of µS−µB under HA true. This entire process can be reworked using different levels of α.

(7) The items in (6) provide information about the consequences of a hypothesis test over wide ranges of values for the alternative hypothesis parameter.

Sample Size Issues

Suppose that you can specify levels of both α and β with which you feel comfortable. If you can do this, you will be able to find the sample size n that results in these specific values ofα andβ . We will address this when we deal with strategies to determine the proper sample size.

Behavior Of Observations Having A Lognormal Distribution

The standard assumption when working with data is that they are normally distributed (Figure A). Environmental data are such that they frequently abide by a distribution that is anything but normal. More specifically, the distribution may be a member of the skewed distribution family (Figure B). One such distribution is the lognormal.

XE XE Figure A Figure B

Knowledge about the underlying parent population from which data are generated is important because this assists in assessing the probability of an apparently extreme observation (XE) and thus whether the observation is indeed probable or should be treated as a possible outlier.

For example, the extreme XE has a low probability of occurrence in Figure A and thus appears to be an aberration if it is assumed that the data are normally distributed. If the parent population is truly lognormal (Figure B), we see that the same XE has a much higher probability of occurrence and should not be treated as an aberration.

To get an idea of the relation between tail areas and corresponding values of X for each type of distribution above, I used Minitab to store 89 arsenic observations in C1. Next, I took the natural logarithm of each value in C1 and stored the entire set in C2. Thus, C2 contains the logs of the original 82 observations. Finally, I calculated the for C1 and C2. These are presented as follows:

Variable N Mean Median TrMean StDev SE Mean C1 89 7.855 6.430 7.294 6.032 0.639 C2 89 1.7857 1.8610 1.7982 0.7807 0.0828

42 Recall the definition of standard normal z:

X − µ z = σ

Suppose we desire that value of z identified with 0.025 in the right tail of the z-curve. This is z.025 =1.96. Assume that C1 is normally distributed and that the mean value 7.855 reported for C1 and the standard deviation 6.032 are fair approximations for µ and σ , respectively. Substituting these items into the equation directly above, we have

X − 7.855 1.96 = 6.032

In order to find that value of X identified with 0.025 in the right tail of the normal curve, we simply solve for X in the equation directly above. Thus

X = µ + z σ = 7.855 + 1.96(6.032) = 19.677 which means that the X value equal to 19.68 has 0.025 of the area under the normal curve to its right, if, in fact, X is normally distributed.

Now, suppose that the variable under discussion is not normally distributed but lognormally distributed. This translates into ln X, not X, being normally distributed. We now desire to find that value of ln X with 0.025 of the area under the normal curve to its right. Given the descriptive statistics for C2, we make the substitutions:

ln X −1.7857 1.96 = 0.7807

Solving for ln X, we have

ln X = µ + z σ = 1.7857 + 1.96(0.7807) = 3.3158 or X = exp (3.3158) = 27.544 which means that the X value equal to 27.544 has 0.025 of the area under the lognormal curve to its right. This also means that a value, say, 19.677 (chosen to match the result of the first example) has an area greater than 0.025 under the lognormal curve to its right. In fact, the area to the right of 19.677 under the assumption of lognormality is calculated to be 0.0630. Thus, a value of 19.677 is more of an extreme occurrence under the assumption of normality compared to lognormality. Information about the underlying distribution of the observations definitely has an impact on one’s assessment of “extreme” values.

Incidentally, a boxplot of the original data corroborated a distribution skewed to the right. A boxplot of the lognormal transformation corroborated a normal shape. The more formal W-test supported these findings.

43

Small Sample Sizes And Parent Distribution Departing From Normality

The confidence intervals that we constructed in previous sections relied on inserting the proper tα /2 or zα /2 values (for a two-sided confidence interval) or the proper tα or zα values (for either an upper- or lower-tailed, one-sided confidence interval) once the (1− α) level of confidence was specified. In our computer experiment that explored the behavior of the sample mean, we saw that normality of the sample mean kicked in, no matter what was the underlying distribution of the parent population, once the sample size reached 30; i.e., when n=30. Normality is also robust to smaller sample sizes provided that the departure of the underlying parent population from normality is not too extreme.

Regarding environmental contaminant data, the underlying parent populations frequently are non-normal. In addition, the practitioner’s data set usually consists of a small number of samples; i.e., n<10. Moderately-sized data sets (from 10 to 15 samples may be considered a luxury.

We have presented two aids for gleaning information about the underlying distribution of the data. A boxplot provided a visual pertaining to the data’s distribution, and the W-test was a formal test of normality of the data.

The lognormal distribution is perhaps the most popular of the skewed distributions used by environmental practitioners to represent their data. Other distributions like the Weibull, Gamma, or Beta have a similar appearance to the lognormal and may even be more appropriate for a given situation. The conventional wisdom when it comes to testing a data set’s distribution is to test the null hypothesis that the original observations are normally distributed. If the null is rejected, transform the data by taking the natural logarithm. Test to see if the transformed data have a normal distribution. If the null is not rejected, continue the analysis using log transformed data. Unfortunately, not rejecting the null does not necessarily mean that the data are lognormally distributed. It simply means that the null cannot be rejected. Other distributions might be more appropriate, and a further test could be constructed between pairs of competing distributions.

In what follows, we emphasize the lognormal framework. We do not get into the issue of estimating the parameters of competing parent distributions and testing which of these distributions is best for our situation.

An Experimental Design: Set-Up For Generating Lognormal Parameter Estimates

Obtaining the parameter estimates that eventually get inserted into the limits for the confidence interval is quite involved. An extensive literature has developed relating to the pros and cons of the different methods. It is easy to get confused regarding the precise items that make up the desired confidence interval once the issue of parameter estimation is addressed. In addition, the literature cautions about the potential poor performance of the confidence interval itself, even after taking the purest approach.

To illustrate the level of involvement, we use an example borrowed from Gilbert (1987, p.166). He generated values for a lognormally distributed random variable X, with population mean µ=X 6.126 and population standard deviation σ=X 8.667. He then did a logarithmic transformation on X to get Y = ln(X). It has population meanµ=Y 1.263 and population standard deviation σY =1.048. So, up to this point, the shape of X is a smooth lognormal curve and skewed to the right. The shape of Y is normal.

44 Next, he took a random sample of 10 observations from X. Each value of X has an accompanying value of Y. These are presented below. They appear in his Table 13.1.

X: 3.161 4.151 3.756 2.202 1.535 20.76 8.42 7.81 2.72 4.43 Y: 1.1509 1.4233 1.3234 0.7894 0.4285 3.033 2.1306 2.0554 1.0006 1.4884

Summary statistics are:

X5= .89 s5X = .69 Y =1.48235 sY = 0.75385 .

This sample information – both sample observations and sample statistics – mimics what we as practitioners have at our disposal. In this artificial setting, we have the additional luxury of knowing how the data were generated. We have the true probability distribution, the true mean, and the true standard deviation. We know what should be the form of our confidence interval. (This added information is most helpful for judging our eventual estimation results).

Parameter Estimators For A Lognormal Distribution

In the absence of population information about the variable of interest and prior to constructing a confidence interval, we would first explore the behavior of the data. Following advice contained in previous sections, we’d look at the boxplot and perform the W-test. Results for the present example would reveal that X is not normally distributed but that Y is. (We know that these results must hold because X was generated to be lognormal).

Our task is to construct an upper one-sided 95% confidence interval for the mean of original variable X.

We refer to the population mean of variable X specifically as µX . Again, the formula for an upper one- sided (1-α) confidence interval for µX is:

0X≤µX ≤ +tα ⋅(s/n)

We might think about applying the formula above to X , but because of the small sample size, the Central Limit Theorem cannot assure a sample mean that is normal, and the confidence interval will probably not work well. An alternative and perhaps more immediate inclination is first to construct the interval around

µY because we know that Y is normally distributed. It would appear as follows:

0Y≤µYY≤ +tα ⋅(s/n) .

Next, it seems reasonable to exponentiate the confidence limit to produce a confidence interval for the mean in terms of the original scale. However, this method actually produces a confidence interval for the median of the distribution, not the mean.

There is a solution, but it is linked to making the proper transformations on Y and sY that will make them unbiased and minimum variance estimators of µX and σX , respectively.

45 Gilbert (1987) summarizes several methods for obtaining estimators. He also proposes a simple method of estimating µX and σX , resulting in:

s2 1 Xe=xpY+Y and sX=exps2 −1 2 , respectively.  X ()Y 2

By way of these formulas, notice, for example, that obtaining the estimator X requires more than simply exponentiating Y. Similar reasoning applies to sX .

To see how these equations performed, we tried them on the original data. Notice that calculating X and sX requires only the substitution of Y =1.48235 and sY = 0.75385 into the two equations directly above. Our calculations were:

0.753852 1 X =+exp 1.48235 =5.85 and s =−5.85exp 0.753852 1 2 =5.117 . X () 2

Notice that these values are different than the sample statistics that were presented with the data. Also note that while X=5.85 compares well with µ X = 6.126 , sX = 5.117 is quite a bit smaller than

σ=X 8.667 . Gilbert states (1987, p. 167) that this estimator is simple to calculate, but it is not efficient (i.e., does not have the standard deviation with the best properties).

Getting Parameter Estimates: Probability Plotting

A second approach for obtaining parameter estimates begins by looking at the possible relationship between the ordered log values of the variable of interest and the cumulative probability for each of these values. On the basis of a plot of these two variables, the mean and variance of the variable in question can be estimated. It should be noted that, although presentation of this approach might seem redundant given that the procedure above already demonstrates a methodology for obtaining parameter estimates, this approach is useful for dealing with censored data, which will be addressed shortly. Thus, laying out the mechanics here will be useful when analyzing censored data later.

As we proceed, the terms and percentile are used interchangeably. Recall that the pth quantile of a population is the number such that a fraction p of the population is less than or equal to this number. The pth quantile is the same as the 100pth percentile; for example, the 0.5 quantile is the same as the 50th percentile.

Again, the variable X is lognormally distributed. So, the variable Y=ln(X) is normally distributed. It is the variable Y with which be begin the analysis. Also, our sample size is small. (If the sample size were large, we could appeal to the Central Limit Theorem).

This procedure is presented in Gilbert (1987, p. 168). Its steps are as follows.

(1) Order the n untransformed X observations from smallest to largest. The result is n order

statistics: xx[]12≤ [ ]≤⋅⋅⋅≤ x[n]. (At the same time, bring along the ordered set of logarithms

yy[]12≤≤[ ] ⋅⋅⋅≤y[n]).

46

(2) For each value or x[]i , calculate(i-0.5)⋅100 / n. This item represents the percentile associated with the respective order statistic.

(3) Gilbert recommends plotting the order statistics against their on log-probability paper and then fitting a straight line by eye. He says this for two reasons. First, if a straight line can be fit among the plotted points, this corroborates normality. Second, once the line is fit, for any percentage value that is specified (e.g., 16, 50, and 84), the corresponding percentile (e.g.,

x16 , x50 , and x84 ) can be read from the plot after taking the appropriate antilog. It turns out that 2 these three percentiles in particular are used to get estimates for µYand σY. This is addressed in Step (5).

(4) Rather than using log-probability paper, fitting a line by eye (as suggested by Gilbert) and, in

turn, obtaining estimates for x16 , x50 ,and x84 by eye, a linear function can be estimated and fitted directly by regressing the ordered values of Y=ln(X) on the calculated percentiles. Notice that values for these two variables are obtained in Step (2). The result will be a fitted linear

regression equation of the form yaˆ i= +b(percenti), where a is the estimated intercept from the

regression, b is the estimated slope, percenti = (i-0.5)⋅100 / n from Step (2), and yˆ i is the

calculated or fitted value of yi obtained once all three items on the right-hand side are substituted into the equation. Also, the fit of the regression equation can be judged by its coefficient of determination R2. The closer that R2 is to 1, the better the fit.

2 (5) As Gilbert (1987, p.168) shows, estimates for µYand σY can be obtained as follows, respectively: 2    2  1 xx50 84  Y=ln(x50 ) and slY =+ n  . 2x x   16 50 

The former equation suggests that the estimated mean ought to be located at the 50th percentile. This is expected for a normally distributed variable. The latter equation provides a measure of spread based on data between the 16th and the 84th percentiles. It has a spread interpretation similar to interquartile range.

(6) The mean and standard deviation of the original distribution are then estimated by taking the calculations above and entering them in the same equations that appeared at the end of the previous section. Again, they are:

s2 1 Xe=xpY+Y and sX=exps2 −12 , respectively.  X()Y 2

To get an idea of the performance of this approach, we applied it to the 10 observations that we introduced for this example on page 45.

(1) All of the X and Y observations are ordered as follows: X: 1.535 2.202 2.72 3.161 3.756 4.151 4.43 7.81 8.42 20.76 Y: 0.4285 0.7894 1.0006 1.1509 1.3234 1.4233 1.4884 2.0554 2.1306 3.033

47 (2) All of the ()i-0.5 ⋅100 / n percent calculations are as follows: Percent: 5 15 25 35 45 55 65 75 85 95

(3) Implementing Minitab, we can plot Y against Percent and notice a linear relationship. (This is not presented here).

2 (4) The fitted regression equation is yˆ i= 0.294+0.023767(percenti). Also, R =0.911. When we

successively let percenti = 16, 50, and 84 and substitute each of these into the regression equation,

we get yˆ i = 0.6743, 1.4824, and 2.2904, respectively. Notice that each yˆ i is shorthand notation for

ln(xi); i.e., each yˆ i is already a logarithm. To get the estimated percentile xˆ i , we must take the

antilog of yˆ i . These are 1.9626, 4.4035, and 9.8788, respectively.

2 (5) Estimates for µYY and σ are:

2 2 1 4.4035 9.8788 Y=1.4824 and sY =+ln =0.65296 . 2 1.9626 4.4035

(6) Estimates for the mean and standard deviation of the original distribution are:

0.65296 1  2 X =+exp1.4824 =6.1036 and sX =−6.1036 exp()0.65296 1 =5.858 . 2

X=6.1036 compares well with µ=X 6.126 ; sX = 5.858 is quite a bit smaller than σ=X 8.667. Both estimates are better than with the previous technique.

Land’s Approach To Get A Confidence Interval

Once the estimates have been obtained, the next step is to construct the upper limit (UL) for insertion into the upper one-sided (1−α) confidence interval. Land (1971, 1975) showed that this upper limit is:

2 sHY α UL=expY ++0.5sY , n-1 where:

n n 1 2 1 2 2 Y= ∑ Yi , sY=−∑(YiY), and ssYY= n i=1 n-1 i=1

Notice that this upper limit has components similar to those of the conventional confidence interval when using z or t. Specifically, it has one component to capture the estimate of the mean, one to capture the estimate of the standard error, and one to capture the potential distance between the sample mean and the population mean. This last component is Hα , and it has an interpretation that parallels tα or zα . Values for Hα can be obtained from tables provided by Land (1975). A subset of these tables is presented as Tables A10-A13 in Gilbert (1987).

48 Gilbert does not follow through with the example, but we do; i.e., we construct UL for a one-sided upper confidence interval. Items for insertion into UL are Y =1.48235, sY = 0.75385 , n = 10, and Hα = 2.621. This results in:

2 (0.75383⋅2.621) UL=exp1.48235 ++0.5()0.75385 =11.303. 9

He points out that Land’s method works provided that one is confident that the underlying distribution is lognormal. Millard and Neerchal (2001, p. 244) point out that, while Land’s method is exact and has optimal properties, it is extremely sensitive to the assumption that the data come from a lognormal distribution. Ginevan and Splitstone (2004, p. 45) point out that lack of fit to a lognormal distribution may be nearly impossible to detect. They take the strong position that Land’s procedure should never be used with environmental contamination data. In their view, bootstrapping is the best approach for constructing confidence intervals for means when the data are not normally distributed.

Dealing With Censored Data Sets

There are situations where the true concentration of the sample being measured may be close to zero. Under these circumstances, the actual measured value may be less than the measurement limit of detection (LOD). When this happens, laboratories may report these values as not detected (ND), as less- than (LT), or as zeroes. When data values below the LOD are unavailable, we say that the data are “censored to the left.”

The dilemma created by NDs, LTs, or artificial zeroes is that they taint the data set. After all, we are trying to characterize correctly the distribution that drives our confidence intervals. If observations are manipulated in this fashion, we end up biasing our estimates of X and sx . It would be ideal if the laboratory had reported the actual value below LOD, if this were possible. In the absence of this course of action, we must use a strategy that avoids bias or at least holds it to a minimum.

The table below lists four simple approaches that could be taken but that lead to biased estimates of X and sx . Regarding the reporting of data, we assume that only LT values are reported when a measurement is below the LOD.

Censoring Approaches That Result in Biased Estimates of X and sX 1. Use all measurements, including LT values. 2. Use only “detect” measurements. Ignore LT values. 3. Replace LT values with zeroes. Proceed with computations. 4. Replace LT values with some number between zero and LOD. Proceed with computations.

Three approaches are suggested as possible alternatives when the data are censored. These are summarized in the table below.

Preferred Censoring Approaches When Trying to Find the Middle 1. Compute the sample median. 2. Compute the trimmed mean. 3. Winsorize the data. Compute the Winsorized mean and Winsorized standard deviation.

49

The first approach in the table above is appropriate when the distribution is symmetric because the mean and median will be the same for symmetric distributions. If the distribution is asymmetric and skewed to the right, then the sample median will tend to be smaller than the true mean. If the distribution is skewed to the left, then the sample median will tend to be larger than the true mean.

Regarding the second approach, let n be the sample size and p be a percentage of observations in the data set to be eliminated or trimmed from the data set. Define the limits of p as 0

Finally, Winsorizing is a technique used with symmetric distributions. It involves removing the trouble values, e.g., the NDs at the lower end, and replacing each with the next largest and available value. At the other end, remove the same number of largest values and replace each with the next smallest and available data value. This revised set of observations is the Winsorized data set.

Suppose that there are n observations in total. Compute the sample mean based on the n Winsorized observations. This is the Winsorized mean. Call it XW . Compute the sample standard deviation based on the same n Winsorized observations. Call it s. (Note: This is not the Winsorized standard deviation). Let v be the number of observations not replaced during the Winsorization process. The Winsorized standard deviation (sW ) is defined as:

sn( -1) s = . W v-1

When constructing a confidence interval using t, the quantity (v-1) will be the number of degrees of freedom and sW is the proper standard deviation to use for insertion into the confidence interval formula.

It turns out the XW is an unbiased estimator of µ , and sW is an approximately unbiased estimator of σ .

Suppose the data happen to be skewed to the right. If a logarithmic transformation shows that the transformed data are from a normal distribution, suggesting that the original data might be lognormal, Winsorization can be used on the transformed data to get their mean and standard deviation. The 2 respective Winsorized estimators YW and sYW can then be inserted into the following formulas to get estimates of the population mean and standard deviation for the original lognormal distribution,

s2 1 Xe=xpY+YW and sX=exps2 −1 2 , respectively. W X ()YW 2

50 Getting Parameter Estimates: Censored Data and Probability Plotting

We return to the artificial data first introduced for this example. Suppose that, of the 10 observations, the first two were missing. We will implement the six-step probability-plotting procedure that we used earlier on the full set of data, with the exception that now the first two observations are missing.

(1) Ordered observations 3-10 for X and Y are: X: 2.72 3.161 3.756 4.151 4.43 7.81 8.42 20.76 Y: 1.0006 1.1509 1.3234 1.4233 1.4884 2.0554 2.1306 3.033

(2) For ordered observations 3-10, the (i-0.5)⋅100 / n percent calculations are: Percent: 25 35 45 55 65 75 85 95

(3) Implementing Minitab, we can plot Y against Percent and notice a linear relationship. (This is not presented here).

2 (4) The fitted regression equation is yˆ ii= 0.1731+0.02546(percent ). Also, R =0.866.

When we successively let percenti = 16, 50, and 84 and substitute each of these into

the regression equation, we get yˆ i = 0.5805, 1.4461, and 2.3118, respectively.

Notice that each yˆ i is shorthand notation for ln(xi); i.e., each yˆ i is already a

logarithm. To get the estimated percentile xˆ i , we must take the antilog of yˆ i . These are 1.7869, 4.2465, and 10.0925, respectively.

2 (5) Estimates for µYY and σ are:

2 2 1 4.2465 10.0925 Y=1.4461 and sY =+ln =0.74935. 2 1.7869 4.2465

(6) Estimates for the mean and standard deviation of the original distribution are:

0.74935 1  2 X =+exp1.4461 =6.1766 and sX =−6.1766 exp()0.74935 1 =6.524 . 2

X=6.1766 compares well withµ=X 6.126 ; sX = 6.524 is smaller than

σ=X 8.667 , but it is surprisingly closer to the population standard deviation than is the one based on the uncensored information.

Strategies To Determine The Proper Number Of Samples

Underlying everything that we have done so far is the tacit concern that the estimates that we generate are good enough to make reliable inferences. We require that each be an accurate representative of some unknown population parameter. To be an accurate representative, it must be based on a sufficient number of observations in order to provide useful information. So the question becomes: What is considered a sufficient number of observations? Fortunately, we have developed a full set of tools (represented by key formulas) that are able to guide us in choosing the sample size n so that our sample statistics achieve a prescribed level of accuracy. We present three methods for determining the proper sample size. Each method is based on a different rule. Successive rules make use of the rule that precede it. 51 Sample Size Based on Variance of the Sample Mean

We begin with the formula for the variance of the sample mean. It is defined as:

σ2 var X = . n

Suppose it is mandated that var X must be no larger than some prespecified level L. Substituting L for var X into the equation above results in:

σ2 L = . n

If we solve the equation above for n, we obtain the sample size that assures us that var X is no larger than L.

σ2 n = . L

Example: Recall the hypothesis test for one mean making use of the nine chromium observations. Descriptive statistics were: n = 9, X=13.349 mg/kg , and s=2.751 mg/kg. Given s, we calculate s2 = 7.568 . Suppose that a new study is to be conducted in the same area as the one from which these descriptive statistics were generated. One key objective of the new study is that enough samples be taken so that the variance of the sample mean, i.e., var X , is no larger than 0.5 mg/kg.

In calculating the required sample size, the formula directly above requires both L, which is specified as 0.5, and σ2 , which we do not have. However, from the previous chromium study, we have the estimate s2 = 7.568 which we can use in place of σ2 . Making the substitutions, we get:

σ2 7.568 n = = =≅15.136 15 . L0.5

Thus, we need 15 samples in total to assure us that var X will be no larger than 0.50. Since we already have generated n=9 from the previous study, we simply need 15-9=6 new samples.

Sample Size Based on Margin of Error of the Sample Mean

We begin with the formula for the (1-α) two-sided confidence interval using z:

σ σ Xz−⋅≤µ≤X+z ⋅ . αα22nn

σ In the formula, notice the quantity that brings X into equality with µ . This quantity is z ⋅ , and we α 2 n will call it E. In our previous treatment of confidence intervals, we referred to E as the margin of error or

52 σ the maximum error of the estimate. (Also, recall the shorthand notation: σ= ). Thus: X n σ E = zαα22⋅=z ⋅σX . n

Because E is developed in the framework of a confidence interval, it can be given a probability interpretation. This is the result of z α 2 being included in the formula for E. Specifically, E is an absolute margin of error. Associated with it is an acceptably small probability α of that error being exceeded. Thus, we are interested in choosing n so that

 Probability X-µ ≥E ≤α.

In this probability formula, both E and α are specified beforehand. Building on the chromium example, let E = 1 mg/kg and α=0.05. The interpretation of the probability statement is as follows. We desire to find the sample size n so that there is only a 100α = 5% chance that the positive or negative difference between the X obtained from the n samples to be collected and the true mean µ is greater than or equal to 1 mg/kg.

Finding the value of n simply requires rearranging the formula for E above and solving for n:

2 zα 2 ⋅σ n=   .  E 

Notice that actually implementing the formula requires having a value for σ. Since we rarely would have this value, we must use the sample standard deviation s in its place. In addition, since we are now using s rather than σ , we must use tα 2 in place of z α 2 . Thus, the formula for n becomes:

2  tsα 2 ⋅  n=   .  E 

We now face a dilemma. Recall that specifying any tα 2 value requires knowledge of the degrees of freedom (df), which translates into the need to know n. But n is the item for which we are solving in the first place. A plausible solution, after specifying a value for α , is to use the corresponding value for z α 2 .

After inserting values for E and s into the formula, obtain a first round estimate for n; call it n1 . Now, having a value for sample size, calculate the corresponding degrees of freedom. Find the corresponding value for tα 2 and insert it and the values for E and s into the formula to get a second round estimate; call it n2 . Compare n2 to n1 . If they are the same, no further iteration is needed. If not, do another iteration. After a few rounds, the value of n will stabilize.

Example: Again, the nine chromium observations yielded the following descriptive statistics: n = 9, X=13.349 mg/kg , and s=2.751 mg/kg. Suppose that a new study is to be conducted in the same area as the one from which these descriptive statistics were generated. One key objective of the new study is to estimate the mean concentration of chromium. We are willing to accept a 10% chance (i.e., α = 0.10 ) of

53 getting a data set for which E= X-µ≥1 mg/kg. We begin by using z α 2 = 1.645, s = 2.751, and E = 1. So,

2 2 zα 2 ⋅σ 1.645⋅2.751 n=1 = =≅20.48 21. E 1

Now, we can proceed with using t. Since n1 =21, df = 20. For α = 0.10 and df = 20, tα 2 = 1.725. Our second round estimate for n is n2 , calculated as:

2 2 tsα 2 ⋅ 1.725⋅2.751 n=2 = =≅22.51 23 . E 1

Repeating this procedure for n3 results in tα 2 = 1.714 with n3 = 22.23 ≅ 23. Since n2 = n3 , we are through. In conclusion, n = 23 samples are needed in total. Since we already have 9, we must generate 23-9 = 14 additional samples.

Sample Size Based on Relative Error of the Sample Mean

It may be that a reliable estimate for σ is not available, but the practitioner might have an idea about what is a desirable measure of σ relative to the population meanµ . Notice that σ/µ, which is a relative standard deviation, also goes by the name coefficient of determination, and its symbol is η. This measure is appealing because it is less variable thanσ .

Interest in relative standard deviation suggests that we should likewise focus on the relative error (RE) of our estimator rather than on E. To get RE, we simply divide E by µ . That is,

X-µ E RE = = . µ µ

We now substitute η and RE for σ and E, respectively, into the previous formula for n. Thus, the new formula for obtaining the desired sample size is:

2 zα 2 ⋅η n=   .  RE 

Notice how the “relative” formula is related to the “unit-driven” formula:

2  σ  2 z ⋅ 2  α 2  zα 2 ⋅η µ zα 2 ⋅σ n= =   =   . RE  E   E   µ  The middle-bracketed portion of the equation above has a commonµ in both its numerator and denominator. Its canceling effect results in the “unit-driven” formula.

54 Example: Again, the nine chromium observations yielded the following descriptive statistics: n = 9, X=13.349 mg/kg , and s=2.751 mg/kg. Suppose that a new study is to be conducted in the same area as the one from which these descriptive statistics were generated. One key objective of the new study is to estimate the mean concentration of chromium. We are willing to accept a 10% chance (i.e., of getting a data set for which the relative error exceeds 20%. (So, we would like to determine the α = 0.10 ) sample size for which Probability RE ≥0.20≤0.10). Suppose also that the practitioner views η=0.50 as reasonable.

We begin by using z α 2 = 1.645, s = 2.751, RE = 0.20, and η=0.50. We solve for n1 as follows:

2 2 zα 2 ⋅η 1.645⋅0.50 n=1 = =≅16.912 17 . RE 0.20

This provides our first round estimate for n. We could proceed in the fashion presented in the previous section to find the final value for n.

Nonparametric Statistical Tests

Whenever we use z or t, in hypothesis testing or confidence interval construction, for example, we are making a parametric assumption that our observations come from a normal distribution. The word parametric implies that a distribution is assumed for the population under consideration. It is entirely possible that this type of assumption is too strong and therefore inappropriate for the situation. If it is, the consequence is that our test results may be misleading. We risk making an incorrect decision using this particular test.

More specifically, with environmental contaminant data, test results might reveal that both the variable under consideration and its logarithmic transformation are not normally distributed. Or, it may be that the number of samples available for analysis is so small that it would be a disaster to assume a particular parent distribution and to conduct statistical tests under this assumption.

Each of these dilemmas has led to the development of a branch of statistics known as . The word nonparametric implies that no distributional assumption is made for the population under consideration. Nonparametric tests tend to provide results that are robust when the usual parametric assumptions are violated.

Previously, we conducted a test for the difference between two means using the (parametric) two-sample t-test on Chromium Site and Background information (measured in mg/kg). We were justified in using the test because we had a sufficient number of observations to test for normality.

Suppose that only the first four Site observations and the first three Background observations were available. The amended data set is presented as follows.

Site Background 10.1548 6.3252 17.8599 7.6762 10.2117 8.0639 13.0761

Under these circumstances, objections certainly could be raised about conducting the two-sample t-test because of the extremely small number of samples.

55 The Mann-Whitney Test

The Mann-Whitney test is the counterpart to the two-sample t-test. The gist of this nonparametric procedure is to compare the central location of the two data sets; i.e., it seeks to determine if the centers differ from each other. The test proceeds to make a statement about their .

The set-up of the Mann-Whitney test is as follows. Let ηS (eta for S) and ηB (eta for B) denote the median Chromium levels for the site samples and the background samples, respectively. Then, the null

( H0 ) and alternative ( Ha ) hypotheses are:

H:0 ηS=ηB (Median level for S and median level for B are the same).

Ha : ηS>ηB (Median level for S is greater than median level for B).

Because the hypotheses are based on each data set’s median, we will focus on ordered observations. So, to apply the test, we first rank the data from both data sets combined, from smallest to largest. We then keep track of where the observation originated; i.e., whether it came from S or B. This information is summarized as follows:

Ordered Data: 6.32 7.67 8.06 10.15 10.21 13.07 17.85 Rank: 1 2 3 4 5 6 7 Source (S or B): B B B S S S S .

Now, pay attention to the sum of the for each group. For B, the sum is 6. For S, the sum is 22. The idea behind the test is simple. If the sum of the ranks for the site observations is “large” relative to the sum of the ranks for the background observations, the alternative hypothesis is supported.

A test statistic called W is calculated for the sum of the ranks for the first group referred to in the null hypothesis. (In this example, that group is S, and W = 4+5+6+7 = 22). Probability values are calculated for the particular value of W, sample size of the first group, and sample size of the second group. These results are reported routinely with computer statistical packages. For example, Minitab reported W = 22, and the p-value was 0.0259. Thus, we have strong evidence to suggest that the median level for S is greater than the median level for B.

In this particular example, there is an obvious distinction in terms of the rankings. Even prior to conducting the test, it is reasonable to expect the alternative hypothesis to be supported simply because of the rankings themselves. In most situations, the delineations are not as crisp. Yet, the Mann-Whitney test is quite powerful even under these circumstances.

Summary

In the previous introductory statistics course, we covered the basics of statistical analysis. In this advanced course, we have built on the past and covered several topics that permit the environmental professional to examine data in a more complete fashion and to make decisions using a relatively sophisticated set of statistical tools.

In one sitting, no one can become an expert in the use of these tools. And, in fact, what we have addressed in this course is merely a sampler as to what is available when it comes to conducting statistical analyses.

56 Nevertheless, the true purpose of this course is to impart on the user that data do behave! Our charge as analysts is to be familiar with patterns of behavior under a variety of conditions. After discovering and characterizing this behavior, we become much better equipped to prescribe a procedure for analyzing the data.

More specifically, beyond simply generating sample statistics and naively using them, we have emphasized the importance of knowing something about the behavior patterns of sample statistics. The reason is that we will be using them to make decisions, so it is imperative that we know how they behave. We have seen that the Central Limit Theorem provides an explanation about the behavior of the sample mean and that this behavior is orderly and depends on the number of observations that go into its calculation.

Ultimately, we use these statistics to make inferences about their population counterparts. We have seen that these inferences (either through the formal hypothesis test approach or through the calculation of a confidence interval) depend heavily upon parametric assumptions about how the data were generated; e.g., that they come from a normal distribution. Under these circumstances, we use the z- distribution or t-distribution when we proceed with hypothesis testing or confidence interval construction.

We have emphasized the importance of testing for normality of the data as a prerequisite for actually using z or t. If the data test not to be normally distributed, we have proposed a logarithmic transformation. We then test these transformed observations for normality. If normality of the transformed observations holds, we can proceed with devising appropriate sample statistics and, in turn, conduct hypothesis tests or construct confidence intervals using z or t. If the transformed observations test not to be normally distributed or if our data set is so small in the first place that the parametric assumptions cannot be tested, we can resort to nonparametric procedures.

Because generating data is so costly, we have presented procedures for calculating the sample size needed to achieve desired levels of precision. We have also showed how to calculate and to minimize the probabilities of making incorrect decisions in a hypothesis-testing context.

References

Gilbert, Richard O., 1987. Statistical Methods for Environmental Pollution Monitoring. Van Nostrand Reinhold, New York.

Ginevan, Michael E. and Douglas E Splitstone, 2004. Statistical Tools for Environmental Quality Measurement. Chapman & Hall, New York.

Land, C.E., 1971. Confidence intervals for linear functions of the normal mean and variance, Annals of Mathematical Statistics 42:1187-1205.

Land, C.E., 1975. Tables of confidence limits for linear functions of the normal mean and variance, in Selected Tables in Mathematical Statistics, vol. III. American Mathematical Society, Providence, R.I., pp. 385-419.

Millard, Steven P. and Nagaraj K. Neerchal, 2001. Environmental Statistics with S-Plus. CRC Press, Boca Raton, Florida.

57