A Flexible Statistical Power Analysis Program for the Social, Behavioral, and Biomedical Sciences

Behavior Research Methods 2007, 39 (2), 175-191 G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences FRANZ FAUL Christian-Albrechts-Universität Kiel, Kiel, Germany EDGAR ERDfeLDER Universität Mannheim, Mannheim, Germany AND ALBERT-GEORG LANG AND AXEL BUCHNER Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany G*Power (Erdfelder, Faul, & Buchner, 1996) was designed as a general stand-alone power analysis program for statistical tests commonly used in social and behavioral research. G*Power 3 is a major extension of, and improvement over, the previous versions. It runs on widely used computer platforms (i.e., Windows XP, Win- dows Vista, and Mac OS X 10.4) and covers many different statistical tests of the t, F, and c2 test families. In addition, it includes power analyses for z tests and some exact tests. G*Power 3 provides improved effect size calculators and graphic options, supports both distribution-based and design-based input modes, and offers all types of power analyses in which users might be interested. Like its predecessors, G*Power 3 is free. Statistics textbooks in the social, behavioral, and biomed- reviews of which we are aware (Kornbrot, 1997; Ortseifen, ical sciences typically stress the importance of power analy- Bruckner, Burke, & Kieser, 1997; Thomas & Krebs, 1997). ses. By definition, the power of a statistical test is the prob- It has been used in several power tutorials (e.g., Buchner, ability that its null hypothesis (H0) will be rejected given that Erdfelder, & Faul, 1996, 1997; Erdfelder, Buchner, Faul, & it is in fact false. Obviously, significance tests that lack sta- Brandt, 2004; Levin, 1997; Sheppard, 1999) and in statis- tistical power are of limited use because they cannot reliably tics textbooks (e.g., Field, 2005; Keppel & Wickens, 2004; discriminate between H0 and the alternative hypothesis (H1) Myers & Well, 2003; Rasch, Friese, Hofmann, & Naumann, of interest. However, although power analyses are indispens- 2006a, 2006b). Nevertheless, the user feedback that we re- able for rational statistical decisions, it was not until the late ceived coincided with our own experience in showing some 1980s that power charts (see, e.g., Scheffé, 1959) and power limitations and weaknesses of G*Power 2 that required a tables (see, e.g., Cohen, 1988) were supplemented by more major extension and revision. efficient, precise, and easy-to-use power analysis programs In the present article, we describe G*Power 3, a program for personal computers (Goldstein, 1989). G*Power 2 (Erd- that was designed to address the problems of G*Power 2. felder, Faul, & Buchner, 1996) can be seen as a second- We begin with an outline of the major improvements in generation power analysis program designed as a stand- G*Power 3 and then discuss the types of power analyses cov- alone application to handle several types of statistical tests ered by this program. Next, we describe program handling commonly used in social and behavioral research. In the past and the types of statistical tests to which it can be applied. 10 years, this program has been found useful not only in the We then discuss the statistical algorithms of G*Power 3 and social and behavioral sciences but also in many other disci- their accuracy. Finally, program availability and some Inter- plines that routinely apply statistical tests, including biology net resources supporting users of G*Power 3 are described. (Baeza & Stotz, 2003), genetics (Akkad et al., 2006), ecol- ogy (Sheppard, 1999), forest and wildlife research (Mellina, IMPROVEMENTS IN G*POWER 3 IN Hinch, Donaldson, & Pearson, 2005), the geosciences (Bus- COMPARISON WiTH G*POWER 2 bey, 1999), pharmacology (Quednow et al., 2004), and med- ical research (Gleissner, Clusmann, Sassen, Elger, & Helm- G*Power 3 is an improvement over G*Power 2 in five staedter, 2006). G*Power 2 was evaluated positively in the major respects. First, whereas G*Power 2 requires the E. Erdfelder, [email protected] 175 Copyright 2007 Psychonomic Society, Inc. 176 FAUL, ERDFELDER, LANG, AND BUCHNER DOS and Mac OS 7–9 operating systems that were com- research process, and the specific research question, five mon in the 1990s but are now outdated, G*Power 3 runs different types of power analysis can be reasonable (cf. on the personal computer platforms currently in widest Erdfelder et al., 2004; Erdfelder, Faul, & Buchner, 2005). use: Windows XP, Windows Vista, and Mac OS X 10.4. We describe these methods and their uses in turn. The Windows and Mac versions of the program are essentially equivalent. They use the same computational A Priori Power Analyses routines and share very similar user interfaces. For this In a priori power analyses (Cohen, 1988), sample reason, we will not differentiate between these versions in size N is computed as a function of the required power what follows; users simply have to make sure to download level (1 2 b), the prespecified significance level a, and the version appropriate for their operating system. the population effect size to be detected with probability Second, whereas G*Power 2 is limited to three types 1 2 b. A priori analyses provide an efficient method of of power analyses, G*Power 3 supports five different controlling statistical power before a study is actually con- ways to assess statistical power. In addition to the a pri- ducted (see, e.g., Bredenkamp, 1969; Hager, 2006) and ori, post hoc, and compromise power analyses that were can be recommended whenever resources such as the time already covered by G*Power 2, the new program offers and money required for data collection are not critical. sensitivity analyses and criterion analyses. Third, G*Power 3 provides dedicated power analysis Post Hoc Power Analyses options for a variety of frequently used t, F, z, c2, and In contrast to a priori power analyses, post hoc power exact tests in addition to the standard tests covered by analyses (Cohen, 1988) often make sense after a study G*Power 2. The tests captured by G*Power 3 and their has already been conducted. In post hoc analyses, 1 2 b effect size parameters are described in the Program Han- is computed as a function of a, the population effect size dling section. Importantly, users are not limited to these parameter, and the sample size(s) used in a study. It thus tests because G*Power 3 also offers power analyses for becomes possible to assess whether or not a published generic t, F, z, c2, and binomial tests for which the non- statistical test in fact had a fair chance of rejecting an in- centrality parameter of the distribution under H1 may correct H0. Importantly, post hoc analyses, like a priori be entered directly. In this way, users are provided with analyses, require an H1 effect size specification for the a flexible tool for computing the power of basically any underlying population. Post hoc power analyses should statistical test that uses t, F, z, c2, or binomial reference not be confused with so-called retrospective power anal- distributions. yses, in which the effect size is estimated from sample Fourth, statistical tests can be specified in G*Power 3 data and used to calculate the observed power, a sample using two different approaches: the distribution-based ap- estimate of the true power.1 Retrospective power analy- proach and the design-based approach. In the distribution- ses are based on the highly questionable assumption that based approach, users select the family of the test statistic the sample effect size is essentially identical to the effect (t, F, z, c2, or exact test) and the particular test within size in the population from which it was drawn (Zumbo & that family. This is how power analyses were specified in Hubley, 1998). Obviously, this assumption is likely to be G*Power 2. In addition, a separate menu in G*Power 3 false, and the more so the smaller the sample. In addition, provides access to power analyses via the design-based sample effect sizes are typically biased estimates of their approach: Users select (1) the parameter class to which population counterparts (Richardson, 1996). For these the statistical test refers (correlations, means, proportions, reasons, we agree with other critics of retrospective power regression coefficients, variances) and (2) the design of analyses (e.g., Gerard, Smith, & Weerakkody, 1998; Hoe- the study (e.g., number of groups, independent vs. depen- nig & Heisey, 2001; Kromrey & Hogarty, 2000; Lenth, dent samples). On the basis of the feedback we received 2001; Steidl, Hayes, & Schauber, 1997). Rather than use about G*Power 2, we expect that some users might find retrospective power analyses, researchers should specify the design-based input mode more intuitive and easier to population effect sizes on a priori grounds. To specify the use. effect size simply means to define the minimum degree Fifth, G*Power 3 supports users with enhanced graph- of violation of H0 a researcher would like to detect with ics features. The details of these features will be outlined a probability not less than 1 2 b. Cohen’s definitions of in the Program Handling section. small, medium, and large effects can be helpful in such effect size specifications (see, e.g., Smith & Bayen, 2005). TYPES OF STATISTicAL POWER ANALYSES However, researchers should be aware of the fact that these conventions may have different meanings for differ- The power (1 2 b) of a statistical test is the complement ent tests (cf. Erdfelder et al., 2005). of b, which denotes the Type II or beta error probability of falsely retaining an incorrect H0. Statistical power de- Compromise Power Analyses pends on three classes of parameters: (1) the significance In compromise power analyses (Erdfelder, 1984; level (i.e., the Type I error probability) a of the test, (2) the Erdfelder et al., 1996; Müller, Manz, & Hoyer, 2002), size(s) of the sample(s) used for the test, and (3) an effect both a and 1 2 b are computed as functions of the ef- size parameter defining H1 and thus indexing the degree fect size, N, and the error probability ratio q 5 b/a.

A Flexible Statistical Power Analysis Program for the Social, Behavioral, and Biomedical Sciences

Binomials, Contingency Tables, Chi-Square Tests

Binomial Test Models and Item Difficulty

Bitest — Binomial Probability Test

Jise.Org/Volume31/N1/Jisev31n1p72.Html

The Exact Binomial Test Between Two Independent Proportions: a Companion

A Flexible Statistical Power Analysis Program for the Social, Behavioral, and Biomedical Sciences

Week 10: Chapter 10

Power of the Binomial Test Is Computed Using the Binomial (Exact) Sampling Distribution of the Test Statistic

Sample Size Calculation with Gpower

Applied Nonparametric Statistical Tests to Compare Evolutionary

Nonparametric Test Procedures 1 Introduction to Nonparametrics

A Practical Guide to Modeling Financial Risk with MATLAB a Practical Guide to Modeling Financial Risk with MATLAB