Rsch 6110 Richard G. Lambert Some Lecture Materials for Discussing Statistical Power

Review the following terms and issues:

The 2x2 decision graphic

Correct decision when the null hypothesis is true

Type I error

Correct decision when the null hypothesis is false

Type II error

Alpha

Beta

Power

There are mathematical formulae designed to calculate statistical power for each experimental design. These allow you to create, or iterate through, what if scenarios. You can examine what your chances of are of showing that the treatment works, given a specific research design and treatment effectiveness. It is always wise to go through this exercise before conducting an experiment. It not only helps you modify your design to get the odds in your favor, but will force you to think through all of the important issues related to experimental design.

There are only four variables that can adjust the statistical power of an experiment:

1.) Sample size

2.) Variability of the population

3.) Effect size associated with the treatment.

4.) Alpha

Now let’s look at the specific influence that adjustments in each of these variables has on the statistical power of an experiment: (Illustrate each on the screen from sampdist.xls)

1.) Sample size

This is the most straightforward way to impact power. Assuming that all else remains the same, if sample size goes up, power goes up. If sample size goes down, power goes down. Changing the sample size when all else remains the same has the effect of lowering or inflating the standard error of the statistic that is employed. Greater precision in estimation reduces the variability of the sampling distributions of the both the null is true and null is false conditions and thus increases power. If the means of the two sampling distributions remain just as far apart, or treatment effect remains constant, as sample size increases the sampling distributions will become more peaked and thus the correct decisions region will increase relative to the beta region in the distribution for the condition where the null is false.

2.) Variability of the population

Reducing the variability of the population from which samples are selected has the effect of increasing power, given that all other variables remain the same. Similarly, increasing the variability of the population decreases power. This process works in much the same way as the sample size changes. Remember the standard error of the mean or percentage formulae. Both are ratios of the variability in the population to the sample size. Therefore, changes in either will adjust precision.

3.) Effect size associated with the treatment

When all else remains the same, anything that will increase the effect of the treatment in the experiment will increase statistical power while weakening the treatment lowers statistical power. This can be explained by remembering that the distance between the means of the two sampling distributions is determined by the strength of the treatment effect. The more powerful the impact of the treatment, the further the null is false distribution will move to the right. As it moves to the right, the beta region becomes smaller.

4.) Alpha

If the other four variables remain constant, then reducing the probability of making a Type I error, saying that the treatment works when it really does not, will lower power. For example, moving from alpha = .05, to alpha = .01, or becoming more conservative with respect to Type I errors, will lower power. The mechanism that causes this lose of power stems from the critical value moving to the right along the x axis in order to cut off a smaller area under the sampling distribution for the condition when the null hypothesis is true. This in turn puts more of the area under the sampling distribution for the condition when the null hypothesis is false to the left of the critical value and increases the size of the beta region.

The opposite will happen if alpha becomes more liberal, or is raised from say .01 to .05. With the higher alpha value the null hypothesis will be rejected more often because the critical value will have been moved to the left on the x axis. This will result in declaring that the treatment works more often and will increase power.

Now let’s talk about practical ways that you can actually change the power of an experiment in each of these areas:

1.) Sample size a. Increase sample as much as is practically possible.

b. Increase response rate.

c. Consider shortening the data collection or treatment protocol to reduce cost. Giving up the chance to get some information and thus decreasing cost per subject so that you can afford to collect data from more subjects may be a valuable trade off.

d. Add more sites.

f. Using multiple measurements over time increases the sensitivity of the design without increasing the number of subjects.

2.) Variability of the population

The central issue here is the control of nuisance variable. Nuisance or confounding variables have the effect of increasing the variability in the population and therefore decreasing power. There are four ways to do so:

a. Admissibility criteria. If you narrow the focus of the study to a more specific sub- group in the population, this may have the effect of studying a more homogeneous group.

b. Controlling or standardizing the experimental environment. If you make sure that the study protocol is extensive enough to eliminate variance that is due to differences in implementation or distractions to the subjects then variance in the dependent variable with be reduced and power will be increased.

c. Building more variables into the design. Making our theories more extensive always helps create more homogeneous cells within our design and thus reduces variance and increases power. Anticipating and planning to test for the influence of extraneous influences on the dependent variable make for tighter experimental designs which can result in easier interpretation of results.

d. Statistical control. Analysis of covariance and other statistical methods for controlling nuisance variables can increase power.

Remember that the variability discussed here is in the scores or observations of the dependent variable. Therefore, there are a few other considerations in reducing the variability in the population:

e. Reducing measurement error in the dependent variable will increase power.

3.) Effect size associated with the treatment.

Of course it is difficult to directly influence the strength of a treatment. However, there are several issues to consider that can have an impact on the observed treatment effect:

a. Consider whether the dosage of the treatment is appropriate and sufficient to give the best chance to show the true treatment effects.

b. Consider whether the duration is appropriate and sufficient to show the true treatment effects.

c. Consider whether the training, supervision, and monitoring of those implementing the treatment is sufficient to allow the true treatment effects to emerge.

d. Make sure there is a strong linkage between the treatment and the dependent variable. The dependent variable should be valid for the purpose of showing the treatment effects under investigation.

4.) Alpha

This is largely determined by tradition. Although it is conceptually necessary to include this issue in this discussion, and while the researcher can select any alpha level, this decision is really made at the level of editorial policy, grant review protocol, and government regulation. In practical terms, this decision is not really under the direct control of the experimenter. While an experimenter can set alpha at any level, breaking from the traditions and recommended practices in a given discipline typically will not result in publications, funded grants, or approved treatments.

However, there is an alpha related decision that the researcher can use to impact statistical power. If all other factors remain the same, a one tailed significance test is more powerful than a two tailed test. This concept can easily be seen graphically. If all of the rejection region is placed in one tail, the critical value will be moved to the left and thus more of the area under the null is false distribution will be to the right of the critical value. The decision to use a one tailed test involves judgement as to whether the theory and literature in the field would justify an expectation regarding the directionality of the treatment effects.