Statistics in Psychological Research

A10_304213 9/4/07 12:34 PM Page A-10

Appendix B Statistics in Psychological Research

Understanding and interpreting the results of psychological research depends on statistical analyses, which are methods for describing and drawing conclusions from data. The chapter on research in psychology introduced some terms and concepts associated with descriptive statistics—the numbers that psychologists use to describe and present their data—and with inferential statistics—the mathematical procedures used to draw conclusions from data and to make inferences about what they mean. Here, we present more details about these statistical analyses that will help you to evaluate research results.

Describing Data

To illustrate our discussion, consider a hypothetical experiment on the effects of incentives on performance. The experimenter presents a list of mathematics problems to two groups of participants. Each group must solve the problems within a fixed time, but for each correct answer, the low-incentive group is paid ten cents, whereas the high-incentive group gets one dollar. The hypothesis to be tested is the null hypothesis, the assertion that the independent variable manipulated by the experimenter will have no effect on the dependent variable measured by the experimenter. In this case, the null hypothesis is that the size of the incentive (the independent variable) will not affect performance on the mathematics task (the dependent variable). Assume that the experimenter has gathered a representative sample of participants, assigned them randomly to the two groups, and done everything possible to avoid the confounds and other research problems discussed in the chapter on research in psychology. The experiment has been run, and the psychologist now has the data: a list of the number of correct answers given by each participant in each group. Now comes the first task of statistical analysis: describing the data in a way that makes them easy to understand.

The Frequency Histogram The simplest way to describe the data is to draw up something like Table 1, in which all the numbers are simply listed. After examining the table, you might notice that the high-incentive group seems to have done better than the low-incentive group, but this is not immediately obvious. The difference might be even harder to see if more participants had been involved and if the scores included three-digit null hypothesis The assertion that the numbers. A picture is worth a thousand words, so a better way of presenting independent variable manipulated by the same data is in a picture-like graphic known as a frequency histogram (see the experimenter will have no effect on Figure 1). the dependent variable measured by the Construction of a histogram is simple. First, divide the scale for measuring the experimenter. dependent variable (in this case, the number of correct answers) into a number of frequency histogram A graphic categories, or “bins.” The bins in our example are 1–2, 3–4, 5–6, 7–8, and 9–10. presentation of data that consists of a set of bars, each of which represents Next, sort the raw data into the appropriate bin. (For example, the score of a par- how frequently different scores or ticipant who had 5 correct answers would go into the 5–6 bin, a score of 8 would values occur in a data set. go into the 7–8 bin, and so on.) Finally, for each bin, count the number of scores in descriptive statistics Numbers that that bin and draw a bar up to the height of that number on the vertical axis of a summarize a set of research data. graph. The resulting set of bars makes up the frequency histogram.

A-10 A11_304213 9/4/07 12:34 PM Page A-11

Describing Data ● A-11

TABLE 1 A Simple Data Set

Here are the test scores obtained by Low Incentive High Incentive thirteen participants performing under low-incentive conditions and thirteen 46 participants performing under high- 64 incentive conditions. 210 710 67 810 36 57 25 39 59 93 58

Because we are interested in comparing the scores of two groups, there are sep- arate histograms in Figure 1: one for the high-incentive group and one for the low- incentive group. Now the difference between groups that was difficult to see in 5 Table 1 becomes clearly visible: High scores were more common among people in 4 the high-incentive group than among people in the low-incentive group. Histograms and other pictures of data are useful for visualizing and better 3 understanding the “shape” of research results, but in order to analyze those results 2 statistically, we need to use other ways of handling the data that make up these 1 graphic presentations. For example, before we can tell whether two histograms are Number of cases different statistically or just visually, the data they represent must be summarized Test score categories usingdescriptive statistics . Low incentive Descriptive Statistics

5 The four basic categories of descriptive statistics (1) measure the number of observations made; (2) summarize the typical value of a set of data; (3) summarize the 4 spread, or variability, in a set of data; and (4) express the correlation between two 3 sets of data. 2 1 N The easiest statistic to compute, abbreviated as N, simply describes the number Number of cases of observations that make up the data set. In Table 1, for example, N 13 for each Test score categories group, or 26 for the entire data set. Simple as it is, N plays a very important role in High incentive more sophisticated statistical analyses.

Measures of Central Tendency It is apparent in the histograms in Figure 1–2 5–6 9–10 1 that there is a difference in the pattern of scores between the two groups. But how 3–4 7–8 much of a difference? What is the typical value, the central tendency, that represents each group’s performance? As described in the chapter on research in psychology, there are three measures that capture this typical value: the mode, the median, and the FIGURE 1 mean. Recall that the mode is the value or score that occurs most frequently in the Frequency Histograms data set. The median is the halfway point in a set of data: Half the scores fall above the median, half fall below it. The mean is the arithmetic average. To find the mean, The height of each bar of a histogram represents the number of scores falling add up the values of all the scores and divide that total by the number of scores. within each range of score values. The pattern formed by these bars gives a Measures of Variability The variability, or spread, or dispersion of a set of visual image of how research results are data is often just as important as its central tendency. This variability can be quan- distributed. tified by measures known as the range and the standard deviation. A12_304213 9/4/07 12:34 PM Page A-12

A-12 ● APPENDIX B Statistics in Psychological Research

TABLE 2 Calculating the Standard Deviation

The standard deviation of a set of scores Difference reflects the average degree to which Raw Data from Mean DD2 those scores differ from the mean of the set. 22 4 24 22 4 24 33 4 11 44 4 00 99 4 525 Mean 20/5 4 ∑D2 34

2 ∑ D 34 Standard deviation = ===68.. 26 N 5

Note: ∑ means “the sum of.”

As described in the chapter on research in psychology, therange is simply the difference between the highest and the lowest values in a data set. For the data in Table 1, the range for the low-incentive group is 9 2 7; for the high-incentive group, the range is 10 3 7. The standard deviation , or SD , measures the average difference between each score and the mean of the data set. To see how the standard deviation is calculated, consider the data in Table 2. The first step is to compute the mean of the set—in this case, 20/5 4. Second, calculate the difference, or deviation (D), of each score from the mean by subtracting the mean from each score, as in column 2 of Table 2. Third, find the average of these deviations. Notice, though, that if you calculated this average by finding the arithmetic mean, you would sum the deviations and find that the negative deviations exactly balance the positive ones, resulting in a mean difference of 0. Obviously there is more than zero variation around the mean in the data set. So, instead of employing the arithmetic mean, you compute the standard deviation by first squaring the deviations (which, as shown in column 3 of Table 2, removes any negative values). You then add up these squared deviations, divide the total by N, and then take the square root of the result. These simple steps are outlined in more detail in Table 2.

The Normal Distribution Now that we have described histograms and reviewed some descriptive statistics, let’s reexamine how these methods of representing research data relate to some of the concepts discussed elsewhere in the book. In most subfields in psychology, when researchers collect many measurements and plot their data in histograms, the resulting pattern often resembles the one shown for the low-incentive group in Figure 1. That is, the majority of scores tend to fall in the range A measure of variability that is middle of the distribution, with fewer and fewer scores occurring as one moves toward the difference between the highest and the extremes. As more and more data are collected, and as smaller and smaller bins are the lowest values in a data set. used (perhaps containing only one value each), histograms tend to smooth out until standard deviation (SD) A measure they resemble the bell-shaped curve known as thenormal distribution , or normal of variability that is the average difference between each score and the curve. When a distribution of scores follows a truly normal curve, its mean, median, mean of the data set. and mode all have the same value. Furthermore, if the curve is normal, we can use normal distribution A dispersion of its standard deviation to describe how any particular score stands in relation to the scores such that the mean, median, and rest of the distribution. mode all have the same value. When a IQ scores provide an example. They are distributed in a normal curve, with a distribution has this property, the mean, median, and mode of 100 and an SD of 16—as shown in Figure 2. In such a standard deviation can be used to describe how any particular score stands in relation to the rest of the distribution. A13_304213 9/4/07 12:34 PM Page A-13

Describing Data ● A-13

95% of the scores 68% of the scores

FIGURE 2 The Normal Distribution –2 –1 0 +1 +2 Standard deviations Many kinds of research data approximate 6884 100 116 132 the balanced, or symmetrical, shape of IQ the normal curve, in which most scores fall toward the center of the range. The normal distribution of IQ

distribution, half of the population will have an IQ above 100, and half will be below 100. The shape of the true normal curve is such that 68 percent of the area under it lies in a range within one standard deviation above and below the mean. In terms of IQ, this means that 68 percent of the population has an IQ somewhere between 84 (100 minus 16) and 116 (100 plus 16). Of the remaining 32 percent of the population, half falls more than 1 SD above the mean, and half falls more than 1 SD below the mean. Thus, 16 percent of the population has an IQ above 116, and 16 percent scores below 84. The normal curve is also the basis for percentiles. A percentile score indicates the percentage of people or observations that fall below a given score in a normal distribution. In Figure 2, for example, the mean score (which is also the median) lies at a point below which 50 percent of the scores fall. Thus the mean of a normal distribution is at the 50th percentile. What does this say about IQ? If you score 1 SD above the mean, your score is at a point above which only 16 percent of the population falls. This means that 84 percent of the population (100 percent minus 16 percent) must be below that score; so this IQ score is at the 84th percentile. A score at 2 SDs above the mean is at the 97.5 percentile, because only 2.5 percent of the scores are above it in a normal distribution. Scores may also be expressed in terms of their distance in standard deviations from the mean, producing what are called standard scores. A standard score of 1.5, for example, is 1.5 standard deviations from the mean.

Correlation Histograms and measures of central tendency and variability describe certain characteristics of one dependent variable at a time. However, psychologists are often interested in describing the relationship between two variables. Measures of correlation are frequently used for this purpose. We discussed the interpretation of the correlation coefficient in the chapter on research in psychology; here we describe how to calculate it. Recall that correlations are based on the relationship between two numbers that percentile score A value that are associated with each participant or observation. The numbers might represent, indicates the percentage of people or say, a person’s height and weight or the IQ scores of a parent and child. Table 3 observations that fall below a given contains this kind of data for four participants from our incentives study who point in a normal distribution. took the test twice. (As you may recall from the chapter on cognitive abilities, the standard score A value that indicates correlation between their scores would be a measure of test-retest reliability.) the distance, in standard deviations, between a given score and the mean of all the scores in a data set. A14_304213 9/4/07 12:34 PM Page A-14

A-14 ● APPENDIX B Statistics in Psychological Research

The formula for computing the Pearson product-moment correlation, or r, is as follows:

∑−()()xMyMxy − r = 22 ∑−( xMxy) ∑() yM− where: x each score on variable 1 (in this case, test 1) y each score on variable 2 (in this case, test 2) Mx the mean of the scores on variable 1 My the mean of the scores on variable 2 The main function of the denominator (bottom part) in this formula is to ensure that the coefficient ranges from 1.00 to 1.00, no matter how large or small the values of the variables being correlated. The “action element” of this formula is the numerator (or top part). It is the result of multiplying the amounts by which each of two observations (x and y) differ from the means of their respective distributions (Mx and My). Notice that, if the two variables “go together” (so that, if one score is large, the score it is paired with is also large, and if one is small, the other is also small), then both scores in each pair will tend to be above the mean of their distribution or both of them will tend to be below the mean of their distribution. When this is the case, x Mx and y My will both be positive, or they will both be negative. In either case, when you multiply one of them by the other, their product will always be positive, and the correlation coefficient will also be positive. If, on the other hand, the two variables go opposite to one another, such that, when one score in a pair is large, the other is small, one of them is likely to be smaller than the mean of its distribution, so that either x Mx or y My will have a negative sign, and the other will have a positive sign. Multiplying these differences together will always result in a product with a negative sign, and r will be negative as well. TRY Now compute the correlation coefficient for the data presented in Table 3. THIS The first step (step a in the table) is to compute the mean (M) for each variable. Mx turns out to be 3 and My is 4. Next, calculate the numerator by finding the differences between each x and y value and its respective mean and by multiplying them (as in step b of Table 3). Notice that, in this example, the differences in each pair have like signs, so the correlation coefficient will be positive. The next step is to calculate the terms in the denominator; in this case, as shown in steps c and d in Table 3, they have values of 18 and 4. Finally, place all the terms in the

TABLE 3 Calculating the Correlation Coefficient

(b) Though it appears complex, calculation Participant Test 1 Test 2 (x Mx)(y My) of the correlation coefficient is quite simple. The resulting r reflects the A 1 3 (1 3)(3 4) (2)(1) 2 degree to which two sets of scores tend B 1 3 (1 3)(3 4) (2)(1) 2 to be related, or to co-vary. C 4 5 (4 3)(5 4) (1)(1) 1 D 6 5 (6 3)(5 4) (3)(1) 3 (a) Mx 3My 4 (x Mx)(y My) 8

() ∑−()()xMyM − 8 8 8 e r = xy= ===+.94 22 . ∑−()()xMxy ∑− yM 18× 4 72 848 A15_304213 9/4/07 12:34 PM Page A-15

Inferential Statistics ● A-15

formula and carry out the arithmetic (step e). The result in this case is an r of .94, a high and positive correlation suggesting that performances on repeated tests are very closely related. A participant doing well the first time is very likely to do well again; a person doing poorly at first will probably do no better the second time.

Inferential Statistics

The descriptive statistics from the incentives experiment tell the experimenter that the performances of the high- and low-incentive groups differ. But there is some uncertainty. Is the difference large enough to be important? Does it represent a sta- ble effect or a fluke? The researcher would like to have some measure of confidence that the difference between groups is genuine and reflects the effect of incentives on mental tasks in the real world, rather than the effect of random or uncontrolled fac- tors. One way of determining confidence would be to run the experiment again with a new group of participants. Confidence that incentives produced differences in performance would grow stronger if the same or a larger between-group difference occurs again. In reality, psychologists rarely have the opportunity to repeat, or repli- cate, their experiments in exactly the same way three or four times. But inferential statistics provide a measure of how likely it was that results came about by chance. They put a precise mathematical value on the confidence or probability that rerun- ning the same experiment would yield similar (or even stronger) results.

Differences Between Means: The t Test One of the most important tools of inferential statistics is the t test. It allows the researcher to ask how likely it is that the difference between two means occurred by chance rather than as a function of the effect of the independent variable. When the t test or other inferential statistic says that the probability of chance effects is small enough (usually less than 5 percent), the results are said to be statistically significant. Conducting a t test of statistical significance requires the use of three descriptive statistics. The first component of the t test is the size of the observed effect, the difference between the means. Recall that the mean is calculated by summing a group’s scores and dividing that total by the number of scores. In the example shown in Table 1, the mean of the high-incentive group is 94/13, or 7.23, and the mean of the low- incentive group is 65/13, or 5. So the difference between the means of the high- and low-incentive groups is 7.23 5 2.23. Second, we have to know the standard deviation of scores in each group. If the scores in a group are quite variable, the standard deviation will be large, indicating that chance may have played a large role in producing the results. The next replica- tion of the study might generate a very different set of group scores. If the scores in a group are all very similar, however, the standard deviation will be small, which suggests that the same result would probably occur for that group if the study were repeated. In other words, the difference between groups is more likely to be significant when each group’s standard deviation is small. If variability is high enough that the scores of two groups overlap, the mean difference, though large, may not be statistically significant. (In Table 1, for example, some people in the low-incentive group actually did better on the math test than some in the high-incentive group.) Third, we need to take the sample size, N, into account. The larger the number of participants or observations, the more likely it is that an observed difference between means is significant. This is so because, with larger samples, random fac- tors within a group—the unusual performance of a few people who were sleepy or anxious or hostile, for example—are more likely to be canceled out by the majority, inferential statistics A set of who better represent people in general. The same effect of sample size can be seen procedures that provides a measure of in coin tossing. If you toss a quarter five times, you might not be too surprised if how likely it is that research results came about by chance. A16_304213 9/4/07 12:34 PM Page A-16

A-16 ● APPENDIX B Statistics in Psychological Research

heads comes up 80 percent of the time. If you get 80 percent heads after one hun- dred tosses, however, you might begin to suspect that this is probably not due to chance alone and that some other effect, perhaps some bias in the coin, is significant in producing the results. (For the same reason, even a relatively small correlation coefficient—between diet and grades, say—might be statistically significant if it was based on 50,000 students. As the number of participants increases, it becomes less likely that the correlation reflects the influence of a few oddball cases.) To summarize, as the differences between the means get larger, as N increases, and as standard deviations get smaller, t increases. This increase in t raises the researcher’s confidence in the significance of the difference between means. Let’s now calculate the t statistic and see how it is interpreted. The formula for t is: (MM− ) t = 12 ()()−+−2 2 + NS1111 N22 S⎛NN12⎞ ⎜ ⎟ +− ⎝ ⎠ NN122 NN12 where:

M1 mean of group 1 M2 mean of group 2 N1 number of scores or observations for group 1 N2 number of scores or observations for group 2 S1 standard deviation of group 1 scores S2 standard deviation of group 2 scores Despite appearances, this formula is quite simple. In the numerator is the difference between the two group means; t will get larger as this difference gets larger. The denominator contains an estimate of the standard deviation of the differences between group means; in other words, it suggests how much the difference between group means would vary if the experiment were repeated many times. Because this estimate is in the denominator, the value of t will get smaller as the standard deviation of group differences gets larger. For the data in Table 1,

()MM12− t = ()()NSNS−+−112 2 ⎛NN+ ⎞ 11 22⎜ 12⎟ ⎝ ⎠ NN12+−2 NN12 . − = 723 5 (12)( 5.) 09+ ( 12)( 4.) 46 ⎛ 26 ⎞ ⎜ ⎟ 24 ⎝169⎠

223. = = 2.60 with 24 df .735 To determine what a particular t means, we must use the value of N and a special statistical table called, appropriately enough, the t table. We have reproduced part of the t table in Table 4. First, we have to find the computed values of t in the row corresponding to the degrees of freedom, or df , associated with the experiment. In this case, degrees of freedom are simply N1 N2 2 (or two less than the total sample size or number of scores). Because our experiment had 13 participants per group, df 13 13 2 24. In the row for 24 df in Table 4, you will find increasing values of t in each column. These columns correspond to decreasing p values, the probabilities that the difference between means occurred by chance. If an obtained t value is equal to or larger than one of the values in the t table (on the correct df degrees of freedom (df) The total line), then the difference between means that generated that t is said to be signif- sample size or number of scores icant at the .10, .05, or .01 level of probability. in a data set, less the number of experimental groups. A17_304213 9/4/07 12:35 PM Page A-17

Inferential Statistics ● A-17

TABLE 4 The t Table

This table allows the researcher to p Value determine whether an obtained t value is statistically significant. df .10 (10%) .05 (5%) .01 (1%) If the t value is larger than the one in the appropriate row in the .05 column, 4 1.53 2.13 3.75 the difference between means that 9 1.38 1.83 2.82 generated that t score is usually 14 1.34 1.76 2.62 considered statistically significant. 19 1.33 1.73 2.54 22 1.32 1.71 2.50 24 1.32 1.71 2.49

Suppose, for example, that an obtained t (with 19 df) was 2.00. Looking along the 19 df row, you find that 2.00 is larger than the value in the .05 column. This allows you to say that the probability that the difference between means occurred by chance was no greater than .05, or 5 in 100. If the t had been less than the value in the .05 column, the probability of a chance result would have been greater than .05. As noted earlier, when an obtained t is not large enough to exceed t table values at the .05 level, at least, it is not usually considered statistically significant. The t value from our experiment was 2.60, with 24 df. Because 2.60 is greater than all the values in the 24 df row, the difference between the high- and low-incentive groups would have occurred by chance less than 1 time in 100. In other words, the difference is statistically significant.

Beyond the t Test Many experiments in psychology are considerably more complex than simple com- parisons between two groups. They often involve three or more experimental and control groups. Some experiments also include more than one independent variable. For example, suppose we had been interested not only in the effect of incentive size on performance but also in the effect of problem difficulty. We might then create six groups whose members would perform easy, moderate, or difficult problems and would receive either low or high incentives. In an experiment like this, the results might be due to the size of the incentive, the difficulty of the problems, or the combined effects (known as the interaction) of the two. Analyzing the size and source of these effects is typically accomplished through procedures known as analysis of variance. The details of analysis of variance are beyond the scope of this book. For now, note that the statistical significance of each effect is influenced by the size of the differences between means, by standard deviations, and by sample size in much the same way as we described for the t test. For more detailed information about how analysis of variance and other inferential statistics are used to understand and interpret the results of psychological research, consider taking courses in research methods and statistical or quan- titative methods. A18_304213 9/4/07 12:35 PM Page A-18

A-18 ● APPENDIX B Statistics in Psychological Research SUMMARY

Psychological research generates large quantities of data. centage of people or observations falling below a certain score, Statistics are methods for describing and drawing conclusions and in terms of standard scores, which indicate the distance, in from data. standard deviations, between any score and the mean of the distribution. Another type of descriptive statistic, a correlation coef- Describing Data ficient, is used to measure the correlation between sets of scores. Researchers often test the null hypothesis, which is the assertion that the independent variable will have no effect on the depend- Inferential Statistics ent variable. Researchers use inferential statistics to quantify the probability that conducting the same experiment again would yield The Frequency Histogram Graphic representations such as similar results. frequency histograms provide visual descriptions of data, making the data easier to understand. Differences Between Means: The t Test Descriptive Statistics Numbers that summarize a set of data One inferential statistic, the t test, assesses the likelihood that are called descriptive statistics. The easiest statistic to compute is differences between two means occurred by chance or reflect N, which gives the number of observations made. A set of scores the impact of an independent variable. Performing a t test can be described by two other types of descriptive statistics: a requires using the difference between the means of two sets of measure of central tendency, which describes the typical value of data, the standard deviation of scores in each set, and the a set of data, and a measure of variability. Measures of central number of observations or participants. Interpreting a t test tendency include the mean, median, and mode; variability is typ- requires that degrees of freedom also be taken into account. ically measured by the range and by the standard deviation. Sets When the t test indicates that the experimental results had a of data often follow a normal distribution, which means that low probability of occurring by chance, the results are said to most scores fall in the middle of the range, with fewer and fewer be statistically significant. scores occurring as one moves toward the extremes. In a truly normal distribution the mean, median, and mode are identical. Beyond the t Test When more than two groups must be When a set of data shows a normal distribution, a data point can compared, researchers typically rely on analysis of variance in be cited in terms of a percentile score, which indicates the per- order to interpret the results of an experiment.