<<

Chapter 22 Variables and Their

2.1 Levels of Measurement 2.2 Continuous and Discrete Variables Real Limits • Significant Figures • Rounding 2.3 Summation Notation • Computations 2.4 Connections Cumulative Review • Computers • Homework Tips Exercises for Chapter 2

Personal Trainer Lectlet 2A: Variables and Their Measurement Labs: Lab for Chapter 2 dataGen: Statistical Computational Package and Data Generator reviewMaster 2A Resource 2X: Additional Exercises

15

ch02.indd 15 6/7/17 4:02 PM 16 Chapter 2 Variables and Their Measurement

Learning Objectives ❶ What is a variable? ❷ What are the characteristics of the three measurement scales (nominal, ordinal, and interval/ratio) used to measure variables? Why do we lump interval and ratio variables together? ❸ What is the difference between a continuous and a discrete variable? ❹ What are the real limits of a measurement? ❺ What are the three parts of the rule about rounding? ❻ How is summation notation used to simplify the communication about sums?

“variable” can take on any of several values. There are three kinds of variables (nominal, ordinal, and interval/ratio) based on their level of Ameasurement. Most statistical computation involves summing values; the shorthand indication for a sum is ∑. variable Recall Pygmalion: Rosenthal and Jacobson (1968) led teachers to believe that the measurement second-graders identified as “bloomers” would show IQ spurts but the other children nominal ordinal would not spurt. There are three variables: the pretest IQ, the posttest IQ, and the intel- interval/ratio lectual growth (IQ gain). All three are interval/ratio variables. ∑ (SIG∙ma) = “sum of” Xi (EX∙sub∙EYE) ∑(2X — 4)2 = “sum of two X minus four tatistics can be thought of as the science of understanding data; data are the that quantity squared” results of a series of on one or more variables. Let us be clear DataGen (DAY∙ta∙jen) Sabout what these terms . variable A characteristic A variable is any characteristic of the world that can be measured and that can take that can take on several or on any of several or many different values. For example, height is a variable (defined many different values as the number of inches a person is tall). It takes on the value 70 inches when John is measured and 64 inches when Mary is measured. Values of a variable may sometimes be assigned arbitrarily; for example, biologi- cal sex is a variable to which we may assign the values 1 for female and 2 for male (or 2 for female and 1 for male, or 27 for female and 136 for male, or any other values we choose). measurement The Measurement is the procedure that assigns values to the variable. For our purposes, procedure for assigning a it is enough to realize that a measurement rule must provide a unique and unambiguous value to a variable result for every individual. Thus, in our sex example, although it makes no difference what values are assigned to female and male, the assignment must be the same for all females and for all males. It is not satisfactory measurement to begin by assigning 1 to females and then, halfway through our , change our minds and begin assigning 27. Personal Trainer Click Lectlets and then 2A in the Personal Trainer for an audiovisual discussion of Sec­tions 2.1 through 2.4. Lectlets

ch02.indd 16 6/7/17 4:02 PM 2.1 Levels of Measurement 17

2.1 Levels of Measurement Statisticians distinguish three kinds of variables and measurement levels: nominal, ordi­nal, and interval/ratio. The main characteristics of these kinds of variables are given in Table 2.1. nominal Classification The nominal scale of measurement classifies objects into categories based on of unordered variables some characteristic of the object. Examples of nominal measurements of people are male/female, Republican/Democrat/Independent/decline to state, fraternity member/ nonmember, and Californian/Ohioan/Nebraskan/Alaskan/and so on. In each case, the measurement is the placing of an individual into one of the categories in the measure- ment scale. We require that all measurement operations be mutually exclusive; that is, for example, a person cannot be both a Republican and a Democrat or both an Ohioan and a Nebraskan. The order of categories in a nominal variable is not important. For example, it doesn’t matter from a measurement standpoint whether we refer to the sex distinction as male/female or female/male. It may be useful to assign a numerical value to a nomi- nal category; we might let male = 1 and female = 2. Doing that does not change the fact that the order is irrelevant; it would have made just as much measurement sense to let male = 2 and female = 1. ordinal scale Measurement The ordinal scale of measurement classifies objects into mutually exclusive cat- of variables that have an egories based on some characteristic of the object (as does the nominal level of inherent natural order measurement), and furthermore it requires that this classification have some inher- ent, logical order. An example of ordinal measurement of people is class standing (freshman/sophomore/junior/senior) because it is both mutually exclusive (you can’t be both a freshman and a sopho­more) and ordered (sophomore is more advanced than freshman). Other examples are class rank (first in class/second/third/etc.), grade in course (A/B/C/D/F), and level of depression (not depressed/slightly depressed/moder- ately depressed/severely depressed). We may assign a numerical value to these categories, and if we do, the order of these categories is important (unlike in nominal variables where order is irrelevant). For example, it does make sense to assign freshman = 1, sophomore = 2, junior = 3, and senior = 4, but it does not make sense to assign sophomore = 1, senior = 2, fresh- man = 3, and junior = 4. Order is inherent in ordinal variables, and the values of the variables must reflect that order.

TABLE 2.1 Characteristics of nominal, ordinal, and interval/ratio levels of measurement Level of Measurement Characteristic Nominal Ordinal Interval/Ratio Categories are mutually exclusive. Yes Yes Yes Categories have logical order. No Yes Yes Equal differences in characteristic No No Yes imply equal differences in value.

ch02.indd 17 6/7/17 4:02 PM 18 Chapter 2 Variables and Their Measurement

interval/ratio scale A The interval/ratio level of measurement classifies objects into mutually exclusive measurement scale for categories based on some characteristic of the object (as do the nominal and ordinal ordered variables that has equal units of measurement levels of mea­surement). It requires that this classification have some inherent, logi- cal order (as does the ordinal level of measurement). Furthermore, it requires that the width of all the categories be the same. An example of an interval/ratio variable is temperature as measured in degrees Celsius (°C). When we require that the intervals be equal, we mean, for example, that the temperature difference between 34°C and 35°C is the same as the difference between 77°C and 78°C. It follows that the distance between nonconsecutive measurements must also be equal if the measured differences are equal. For example, the temperature difference between 34°C and 37°C must be the same as that between 75°C and 78°C because both differences are 3 degrees.

BOX 2.1 Ratio level of measurement c Some statisticians divide the interval/ratio level of measurement into two separate This box may be omitted without loss of continuity. levels, called the interval level and the ratio level. Their “interval” level is the same as what we have labeled “interval/ratio.” ratio scale An interval scale The ratio level of measurement has all the characteristics of the interval level, of measurement that has a and furthermore­ it requires that the scale have a true zero point. Examples of ratio true zero point measurements are weight (expressed as number of pounds), height (number of inches), and (number of minutes). A true zero point that the thing being measured actually vanishes when the scale reads zero. Thus, for example, the vari- able weight measures the heaviness of an individual; when the weight is 0 pounds, there is in fact no weight. Note that the existence of a true zero point on a scale does not imply the exis- tence of an individual whose measurement is zero. There are no 0-pound humans, for example. The existence of a true zero simply means that if there were an indi- vidual with no weight, then the weight scale would read 0. Temperature as measured in degrees Celsius is not a ratio scale because 0°C does not mean the absence of heat; 0°C is instead a relatively arbitrary point, the freezing point of pure water at sea level. The true zero of the Celsius scale (the complete absence of heat) is actually –273°C. By contrast, the Kelvin scale of tem- perature has absolute zero temperature as the 0 K point of the scale (thus making the freezing point of water +273 K). The Kelvin scale is therefore a ratio scale. For most statistical procedures (including all those described in this book), there is no difference between the interval and the ratio levels of measurement. Therefore, we will lump interval and ratio scales together, referring to either as an interval/ratio scale.

We must make the distinction among nominal, ordinal, and interval/ratio vari- ables because that are appropriate for variables measured at one level of c measurement may not be appropriate for variables measured at a different level. Nominal Ordinal Interval For example, if Mary is 64 inches tall and John is 70 inches tall, it is reasonable Ratio = NOIR as in “film noir” to say that their average height is 67 inches. However, suppose we code political party preference as Democrat = 1, Republican = 2, and Independent = 3 and that Mary is a Democrat and John is Independent.­ If we average Mary’s and John’s

ch02.indd 18 6/7/17 4:02 PM 2.2 Continuous and Discrete Variables 19

party preferences, we get (1 + 3)/2 = 4/2 = 2, which might lead us to the totally unreasonable conclusion that Mary and John’s average political party preference is Republican. This unreasonableness arises because the average is an appropriate for interval/ratio variables such as height but not for nominal variables such as political preference. We will return to this discussion in Chapter 4.

2.2 Continuous and Discrete Variables Variables at the ordinal or interval/ratio level of measurement can be classified as either continuous variable One continuous or discrete (all nominal variables are discrete). A continuous variable is one with an infinite number of that has an infinite number of possible values between any two adjacent scale values. values between adjacent scale values For example, height is a continuous variable because there is no limit to the refinement we can make in the measurement operation. If we measure more and more precisely, we can ascertain­ that Mary’s height is 64.4 inches, or 64.37 inches, or 64.372 inches, or 64.3719 inches. discrete variable One with A discrete variable is one that has no possible intermediate values between two no possible intermediate adja­cent points. For example, the number of children a woman has is a discrete vari- values between two adjacent points able, with possible values of only whole numbers, not 1.5 or 2.993. To restate: If, for any two values of a variable, you can imagine a meaningful interme­diate value, then the variable is continuous. Otherwise, the variable is discrete. Real Limits When we say that Mary is 64 inches tall, we do not in general mean that she is exactly 64 inches tall; she could actually be 63.8 inches or 64.372 inches tall. All measure- ments made on continuous variables are only approximations because of the (theo- retically) infinite number of possible measured values. When we say that Mary is 64 inches tall, we really mean (if we are to be precise) that Mary is taller than 63.5 inches and shorter than 64.5 inches. We sometimes say, then, that Mary’s height is 64 ± .5 real limits of a measurement inches. The two values 63.5 and 64.5 are referred to as the real limits of the measured The points that are half the value of 64 inches. The real limits are the points that are half the measuring unit above measuring unit above and below the measured value and below the measured value. For example, if Mary’s time in the 100-yard dash is recorded as 10.6 seconds, the real limits of that measurement are 10.55 and 10.65 sec- onds, half the measuring unit (.1 second) below and above the measured value of 10.6 seconds. Equivalently, we can say that Mary’s time is 10.6 ± .05 seconds. Significant Figures Statistical computations on continuous variables often result in values that have more digits than the numbers in the original data. For example, if Mary is 60 inches tall and both Natasha and Carla are 67 inches tall, then the average height of these three women is (60 + 67 + 67)/3 = 64.66666666 . . . inches. Two questions should be answered about such a result: (1) How many of those digits should we report if that average height is to be the final result of our computations? (2) How many digits should we carry if that average is a subcomputation whose result will not be reported directly but will be used in some later computation?

ch02.indd 19 6/7/17 4:02 PM 20 Chapter 2 Variables and Their Measurement

In the behavioral sciences, there is no generally accepted answer to either of these questions. I (and many others) have these two suggestions: 1. Report in the final answertwo more significant figures than were reported in the original data. Thus, because our height data were originally reported as whole numbers, our final answer should be reported as a value with two deci- mal places—that is, as 64.67 inches. If the original data had been reported in tenths, we would report a final answer with three decimal places, and so on. 2. Carry as intermediate subcomputations at least three more significant figures than were originally reported. Thus, the average height used as an intermedi- ate value should be carried forward as 64.667 inches. It is simpler (and often better) to carry forward as many figures as your calculator will hold and round only when you report the final answer. Rounding Now that we have determined how many digits to report, we must decide how to deter- mine the value of the final digit. Why, for example, did we write the final value of the average height as 64.67 and not as 64.66 inches? rounding The procedure for The procedure for determining the final digit is called rounding and is generally abbreviating a number accepted among all statisticians. The rule for rounding has three parts: Rounding rule parts T Part 1 (remainder < 5). If the remainder (the digits beyond the last digit to be reported) is less than 500 . . . , simply discard the remainder. For example, if we wish to round 8.347 pounds to one decimal place, the remainder is 47, which is smaller than 500 . . . , so we discard the remainder and round 8.347 to 8.3 pounds. T Part 2 (remainder > 5). If the remainder is greater than 500 . . . , add 1 to the last digit before the remainder and then discard the remainder. For example, if we wish to round 4.86524 miles to one decimal place, the remainder is 6524, which is larger than 500 . . . , so we increase the 8 by 1 and discard the remain- der, thus rounding 4.86524 to 4.9 miles. T Part 3 (remainder = 5). If the remainder is exactly 500 . . . (that is, exactly halfway between two candidates for the final digit), then use the even final digit c and discard the remainder. For example, if we wish to round 7.850 seconds to one decimal place, the remainder is 50, which is exactly equal to 500. . . . Thus, Part 3 may not be what you learned in math class. 7.850 is exactly halfway between 7.8 and 7.9, so we select the even final digit, If the remainder is 500, 8. Thus 7.850 rounds down to 7.8 seconds (not up to 7.9 seconds as perhaps statisticians round down half the time and round up half you learned in math class). Here’s another example of rounding rule part 3: if the time. we wish to round 2.35 gallons to one decimal place, the remainder is 5, which is equal to 500. . . . 2.35 is exactly halfway between 2.3 and 2.4, so we round c to the even final digit, 4. Thus, 2.35 rounds up to 2.4 gallons. Frequent mistake: rounding Some students find it easy to remember the following expression, which states 3.4501 to 3.4. 3.4501 rounds part 3 of the rule in a different but equivalent way: “Even—leave it. Odd—up.” For up to 3.5 (part 3 does not apply because the remainder example, suppose we wish to round 72.3500 feet to one decimal place. The remain- is not exactly 500 . . .). der is “500”—that is, exactly 5—so part 3 applies. Because the digit just before the

ch02.indd 20 6/7/17 4:02 PM 2.2 Continuous and Discrete Variables 21

remainder is “3,” which is odd, we follow “Odd—up” and round up to 72.4 feet. On the other hand, we may wish to round 81.65 grams to one decimal place. The remainder is “5” and the digit just before the remainder is “6,” which is even, so we “Even—leave it.” We leave the 6 as it is and round to 81.6 grams. Part 3 of the rule for rounding is necessary to prevent a systematic bias from enter- ing our data as a result of rounding. See Box 2.2 for an explanation.

BOX 2.2 How part 3 of the rounding rule prevents bias Suppose your friend Izzy Buyust tells you that part 3 of the rounding rule is too complicated: Why not just round all numbers ending in the digit 5 upward? Here’s how you can show Izzy that his procedure causes a statistical bias in his results. Add up the 100 numbers 0.1, 0.2, 0.3, . . . , 9.7, 9.8, 9.9, and 10.0. The sum is exactly 505. Now ask Izzy first to round each number using his procedure and then to sum the rounded values. His sum will be 510, not 505. When you round all numbers correctly (using part 3 of the rounding rule when necessary), your sum will be 505 as desired. Why the discrepancy? When you and Izzy round, you both leave 10 numbers unchanged (1.0 rounds to 1, 2.0 rounds to 2, etc.). Izzy rounds 50 numbers upward (all those that end in .5, .6, .7, .8, and .9) but only 40 numbers downward (those that end in .1, .2, .3, and .4), and therefore his procedure biases the sum upward (to 510 instead of 505). You, on the other hand, use part 3 of the rounding rule, so you round 45 values upward (half of those that end in .5 as well as all those that end in .6, .7, .8, and .9) and 45 values downward (half of those that end in .5 as well as all those that end in .1, .2, .3, and .4), and your sum comes out to be 505. Thus, part 3 of the rounding rule lets you avoid Izzy’s bias.

Table 2.2 gives more examples of the rounding process.

c TABLE 2.2 Examples of rounding Note that the last column Decimal Places Rounding Rule gives the rounding rule Desired Original Value Remainder Rounded Value Part 1 712.31 1 712.3 1 2 3.697 7 3.70 2 1 5.350 50 5.4 3 0 247.499 499 247 1 (not 3!) 1 84.25 5 84.2 3 1 84.2501 501 84.3 2 2 .005 5 .00 3 0 800.501 501 801 2 (not 3!)

ch02.indd 21 6/7/17 4:02 PM 22 Chapter 2 Variables and Their Measurement

2.3 Summation Notation It is often convenient to assign a symbol to take the place of the name of a variable. Most often we call the variable X; if there are two variables, we call them X and Y; and generally we call a third variable Z. When it is convenient, we use other symbols; for example, we might call height H instead of X. A is a collection of measurements on one or more variables. It is often con­venient to assign an index variable, usually i, that refers to a particular measure- TABLE 2.3 Heights of five ment’s position in a data set. Table 2.3 shows the measurements on the variable height people for five people. When we let i = 1, we refer to the first person, John. When we let i = 4, Height we refer to the fourth person, Carla. The order of data in a data set is usually arbitrary; Person Index (in.) we could arrange the people in alphabetical order, in which case i = 1 would refer to John 1 70 Carla. For data, we usually refer to the last value of i as n. Thus, n is the number Mary 2 64 of observations in the sample; n = 5 for the data in Table 2.3. Natasha 3 65 We often put a subscript on the symbol for the variable when we wish to refer Carla 4 65 Sam 5 68 to a specific data point rather than the variable in general. Thus,X 1 (pronounced EX∙sub∙WON or just EX∙WON) refers to the first value of X; in Table 2.3 (assuming

Xi The ith value of X we are using X to denote height), X1 = 70 inches, John’s height, and X4 = 65 inches, Carla’s height. In general, Xi (pronounced EX∙sub∙EYE) refers to the ith value of X. Many of the computations in statistics involve summing all the entries in a data set. To make communication about such sums more compact, statisticians have developed Xi (EX∙sub∙EYE) ∑ (SIG∙ma) = “sum of” summation­ notation. For purposes of this discussion, we will assume that we have ith (EYEth) made two measurements, X and Y, on each of six participants, as shown in Table 2.4.

∑ Symbol meaning “the We indicate the sum of all the values of the variable X as ∑Xi, which we read as “the sum of ” sum of X sub i.”1 The character ∑ (pronounced “SIG∙ma”) is the Greek capital S (for “sum”; ∑ looks like an E to some students, but it’s really an S). For the data of Table 2.4, we can expand the indicated sum as follows:

XXXXXXX (2.1) Sum of values of a variable ∑ i =+++1 234++56 =+724++23++6 = 24 TABLE 2.4 Measurements on variables X and Y In cases where it is unambiguous, we may drop the i subscript, understanding that ∑X means ∑X . Occasionally, again when the situation is unambiguous, we use three i Xi Yi i 1 7 5 dots (called “ellipses”) to indicate a sum; for example, we may write Equation (2.1) 2 2 12 as ∑Xi = X1 + X2 + … + X6, where the ellipses (…) stand for the missing X3 + X4 + X5. 3 4 4 4 2 10 5 3 9 Computations 6 6 9 The calculations required in statistics (and throughout this book) often require us to perform some calculations on the variable before we perform the summation. Suppose, for the data in Table 2.4, we wish to compute ∑X 2. That expression says, “Sum all the

1 Our summation notation ∑Xi is a simplification of the more complete notation sometimes used by statisti­ n cians ∑ i = 1 Xi, which is read the sum of Xi for all values of i from i = 1 to n.” Because in this book sums will always be taken over all the values of the variable (that is, in this book sums always run from i = l to n), we can use the simpler notation without ambiguity.

ch02.indd 22 6/7/17 4:02 PM 2.3 Summation 23

TABLE 2.5 Computing ∑X2 values of X 2.” Note carefully that ∑X 2 does not say “Sum all the values of X and then 2 X X2 square the sum” (that would be [∑X] , which a very different thing). Thus, the first 7 (7)2 = 49 principle of summation is to be careful to observe what is being summed. 2 2 (2)2 = 4 The computation of ∑X is shown in Table 2.5. Note that we put the original val- 4 16 ues of X in the first column and then create a new column to hold the X 2 values. The 2 4 required sum is then obtained simply by adding down the new column: ∑X 2 = 118. 3 9 2 6 36 Calculations may be even more complex. For example, ∑3X is obtained by first ∑X2 = 118 squaring the X values, as shown in the second column of Table 2.6, and then multi- plying each squared X value by 3, as shown in the third column. Then ∑3X 2 = 354 is obtained by simply adding down that last column. TABLE 2.6 Computing ∑3X2 See Box 2.3 for a note about the organization of computations. XX2 3X2 7 49 3(49) = 147 BOX 2.3 A note regarding computations 2 4 13(4) = 12 4 16 48 The time you spend solving statistics problems can be divided into two parts: time 2 4 12 spent performing the computations and time spent finding errors that you made in 3 9 27 those computations. Many students tend to overlook the impact that error finding 6 36 108 has on their total study time. There are two ways to minimize the error-finding time: ∑3X2 = 354 Work carefully and without distraction in the first place, and organize your work so that errors are easy to locate. Plan on making errors! To err is human! Failing to plan for errors is also human but not very smart! Organize your computations so that you are performing only one operation at a time. For example, in computing ∑3X 2 for the values of X in Table 2.6, you might be tempted to square the numbers and multiply them in the same step, going through a mental process such as: “7 squared is 49 3 is 147, 2 squared is 4 times 3 is 12, 4 squared is 16 times 3 is 48, . . . .” That seems to be more effi- cient because it eliminates writing down a column of values (the second column of Table 2.6); however, alternating back and forth between squaring and multiplying substantially increases the likelihood of making a mistake. It is much better to per- form all the squarings first and then do all the multiplications. You may be saying to yourself, “I can square and multiply in my head; I don’t need that extra step!” and you may be right most of the time. But such a statement ignores the fact that if you do make a mistake, the mistake will be much harder to find when you have combined two computational steps. I highly recommend that computations be performed in this manner: (1) List the data in a column, (2) create a new column for every subcomputation, and (3) per- form a summation only by adding directly down one column. Thus, to compute the sum ∑3X2, we (1) list the original data as shown in the first column of Table 2.6, (2) create a new column with the heading “X2” and perform the squarings, and another new column with the heading “3X2” and perform the multipli- cations, and then (3) obtain the sum by simply adding down that last column. One operation per column leads to the lowest probability of making a mistake.

Let’s consider another example. Suppose we wish to evaluate the sum ∑(2X – 4)2 ∑(2X – 4)2 = “sum of two X minus four that quantity for these values of X: 7, 2, 4, 2, 3, and 6. We start by listing the data, as shown in the squared” first column­ of Table 2.7. Then, to follow the suggestion in Box 2.3 that we perform

ch02.indd 23 6/7/17 4:02 PM 24 Chapter 2 Variables and Their Measurement

TABLE 2.7 Computing only one operation­ at a time, we have to decide which operation to perform first. All ∑(2X − 4)2 calculations must follow the PEMDAS order-of-operations rules of basic algebra: X 2X 2X–4 (2X–4)2 First perform all operations inside parentheses; then perform exponentiation, then 7 14 10 100 multiplication or division, and only then addition or subtraction. Therefore, we must 2 4 0 0 begin with the expression inside the Parentheses, 2X – 4. In that expression, we must 4 8 4 16 perform the Multiplication first, so we form a new column with the heading “2X” and 2 4 0 0 do all the multiplications. Then we can do the Subtractions, creating a new column 3 6 2 4 6 12 8 64 with the heading “2X – 4.” That finishes the expression inside the parentheses, so now ∑(2X – 4)2 = 184 we perform the Exponentiation outside the parentheses by creating a new column with the heading “(2X – 4)2.” That is the expression we wish to sum, so we add down that c column to obtain ∑(2X – 4)2 = 184. 2 2 When you have built up a Note that ∑(2X – 4) is not equal to [∑(2X – 4)] ; that is, we do not obtain column heading that matches ∑(2X – 4)2 simply by adding down the third column of Table 2.7 and then squaring that the required sum, you can sum. result. Adding down the third column gives ∑(2X – 4) = 24, and squaring that result gives [∑(2X – 4)]2 = [24]2 = 576, which is not equal to ∑(2X – 4)2 = 184. TABLE 2.8 Computing ∑XY Occasionally we perform computations with two variables at a time. For exam- XY XY ple, ∑XY is obtained by first listing the values of X and Y as shown in Table 2.8. Then 7 5 35 we create a new column headed “XY” and find∑ XY = 176 by simply adding down 2 12 24 that column. 4 4 16 2 10 20 Note that ∑XY is not equal to ∑X ∑Y. This can be verified by noting that∑ X = 24, 3 9 27 as might be obtained by adding down the first column of Table 2.8, and∑ Y = 49, as 6 9 54 might be obtained by adding down the second column. ∑X ∑Y is thus 24(49) = 1176, ∑XY = 176 which is not equal to ∑XY = 176. We will use summation notation in all chapters in this textbook, and it is crucial Personal Trainer that you become confidently skilled in performing such computations. Click Algebra and then Summation in the Personal Trainer for practice in sum­ Algebra mation notation following the suggestion in Box 2.3.

2.4 Connections

Cumulative Review Chapter 1 discussed statistics in general and described the Pygmalion in the classroom that we will refer to frequently throughout this textbook. It also distinguished between populations (all the members of the group under consideration) and samples (some subset of that group), and between parameters (characteristics of populations) and statistics (characteristics of samples). It also defined the probability of an event as the number of outcomes favorable to that event divided by the total number of outcomes. Computers Many computer programs are available that perform statistical computations—far too many to be covered in any single book. This textbook provides examples from three such programs: Personal Trainer’s DataGen, IBM SPSS Statistics Version 24, and Microsoft Excel (2010 for Windows or 2011 for Macintosh).

ch02.indd 24 6/7/17 4:02 PM 2.4 Connections 25

Personal Trainer’s DataGen, which I wrote, is designed especially for the student who is seeking to comprehend statistics. It allows easy access to all the computations typically encountered in introductory statistics so that the student can do as much or c as little of a computation by hand as is desired. It is remarkably easy to use: Most students have no trouble getting it to do what they want within the first 5 minutes of Your instructor may suggest using either DataGen, SPSS, operation. All statistics are presented auto­matically; there are no “statistics” menus to or Excel. select or buttons to press. DataGen is thus a program specialized for the acquisition of statistical comprehension by introductory statistics students. Because of this special- ization, however, DataGen is by no means a comprehensive program and thus does not perform many important (but more advanced) statistical analyses. DataGen operates in the Excel environment; that is, your computer must have Excel (Windows 2007 or later, Macintosh 2011 or later) installed for DataGen to operate. IBM SPSS Statistics is a statistical package designed to analyze a wide variety of statistical data. There are many different versions of SPSS; the one we will use for examples is Version 24. The instructions for SPSS presented here apply to both the Windows and the Macintosh versions of SPSS. SPSS is extremely versatile, capable­ of performing many statistical tasks that go far beyond the needs of the introductory statistics student. The price for this versatility is that it is somewhat more difficult to use than DataGen. Microsoft Excel is a versatile spreadsheet that can be made to perform many sta- tistical functions (not as many as SPSS, more than DataGen). It is a widely used data management tool in industry and universities. Like SPSS, its versatility makes it some- what more difficult to use than DataGen to explore statistical relationships. DataGen is thus statistical explorational software; SPSS and Excel are statistical computational software. Exploration and computation overlap, but DataGen makes the exploration and comprehension of the basics of statistics transparent. It is much easier for the student to see what is hap­pening in a particular analysis. When the student moves beyond introductory statistics, however, DataGen’s limitations become more apparent—and then SPSS, Excel, or another complete package is the better choice. However, I have found that students who start with DataGen usually find that the later transition to SPSS or Excel (or other complete statistics package) is easy. Both the exploration/comprehension and the tool-acquisition arguments are persua­sive, so this textbook will provide examples from DataGen, SPSS, and Excel. Personal Trainer Click ESTAT and then DataGen in the Personal Trainer. Then use DataGen to DataGen compute ∑X and ∑Y for the data in Table 2.4: 1. In the Personal Trainer, click DataGen (Win) if you are a Windows user or DataGen (Mac) if you are a Macintosh user. DataGen (Win) requires that you have Microsoft Office Excel 2007 or later installed on your Windows computer. DataGen (Mac) requires that you have Microsoft Office Excel 2011 or later installed on your Macintosh computer. If you have a choice, use the Windows version of DataGen because Macintosh Excel does not allow multi- ple windows to be open at the same time. That forces Macintosh users to close one window to look inside another, a substantial inconvenience in DataGen. 2. If Excel presents a message that says “Macros have been disabled” or some- thing similar, click Enable Content or Enable Macros or whatever similar option is presented.

ch02.indd 25 6/7/17 4:02 PM 26 Chapter 2 Variables and Their Measurement

3. Enter the data from Table 2.4 into the DataGen spreadsheet. To enter values going down a column, press Enter after each value. To enter values going across a row, press Tab after each value. You may also navigate with the arrow keys or the mouse. You do not need to enter the i values because ESTAT auto- matically provides them. Enter the six X values into the Variable 1 column; enter the Y values into Variable 2. 4. If you like, you may change the headings: Click anywhere in the Variable 1 column. Click Rename variable; enter X into the cell provided; click OK. Also change the name of Variable 2 to Y. 5. Note that the sum of each column appears automatically in the DataGen Descrip­tive Statistics window on the line labeled “Sum of observations.” Now use DataGen to compute ∑XY: 6. Highlight the first six cells of theVariable 3 column. Then click Edit variable. 7. Next to the Product button check the boxes marked 1 and 2. Then click Product. 8. Note that the Variable 3 column now holds the product of X and Y. Note also that the sum of these products (176) now automatically appears in the DataGen window. (Macintosh users will have to close the Edit Variable window and then click Show Descriptive Statistics Window.) You can vary steps 6 and 7 to find any sum or product of variables. S Use SPSS to compute ∑X and ∑Y: a % 1. Start SPSS by clicking Start > All Programs > IBM SPSS > IBM SPSS SPSS Statistics 24. 2. Click the Type in Data button and then OK. 3. Enter the data from Table 2.4 into the SPSS spreadsheet. To enter values going down a column, press Enter or Tab after each value. To enter values going across a row, press after each value. You do not need to enter the i values. Just enter two columns of data (X and Y). 4. Ask SPSS to show sums by clicking Analyze, then Descriptive Statistics, and then Frequencies... . 5. Request va r 00001 to be displayed by clicking . Then click va r 00002 and click . 6. Click Statistics... , check the Sum box, click Continue, and click OK. 7. Figure 2.1 shows the output, with my annotations, that will appear in the Outputl–IBM SPSS Statistics Viewer window; the sums are in the table called “Statistics.” You should see 24.00 as the sum of VAR00001 and 49.00 as the sum of VAR00002. Now use SPSS to compute ∑XY: 8. Return to the Data Editor by clicking the Go To Data button (looks like a grid with a big star) on the button bar near the top of the Output1 screen.

ch02.indd 26 6/7/17 4:02 PM 2.4 Connections 27

Check that these are correct

Statistics VAR00001 VAR00002 N Valid 6 6 Missing 0 0 Sum 24.00 49.00

SX SY

FIGURE 2.1 Sample SPSS output: Descriptive statistics from Table 2.4 9. Click var in the third column. 10. Click Transform and then click Compute Variable... . 11. Create a new variable label: Type “product” in the “Target Variable” cell. 12. Request the multiplication: Click , click *, click VA R 00002, click , and then click OK. 13. Note that a new column is created that holds the product of X and Y. The val- ues should be 35.00 (= 7.00 × 5.00), 24.00, 16.00, 20.00, 27.00, and 54.00 Quite a few steps are involved in asking SPSS to perform this simple task. Fortunately, the effort expended here will make subsequent tasks easier. You can vary steps 9 and 10 to ask SPSS to create and evaluate any mathematical expression—an example of the versatility of SPSS.

H : m = m 0 1 2 Use Excel to compute ∑X and ∑Y.

X 1. Start Excel by clicking Start > All Programs > Microsoft Office > Microsoft Excel Excel 2010. The instructions for Excel 2011 or later are the same. 2. Enter the data from Table 2.4 into the Excel spreadsheet, starting in the first row of the first column. To enter values going down a column, press Enter or after each value. To enter values going across a row, press Tab or after each value. You do not need to enter the i values. Just enter two columns of data (X and Y). 3. Highlight the six values in column A. Click the Formulas tab; then click ∑ AutoSum in the ribbon at the top of the Excel window. Note that the sum (24) appears in cell A7. 4. Highlight the six values in column B. Click ∑ AutoSum in the ribbon at the top of the Excel window. Note that the sum (49) appears in cell B7. Now use Excel to compute ∑XY. 5. Click on spreadsheet cell C1. Type =A1*B1. Don’t forget to type the equal sign at the beginning of this entry. Then press Enter. Note that the value 35 (= 7 × 5) appears in cell C1.

ch02.indd 27 6/7/17 4:02 PM 28 Chapter 2 Variables and Their Measurement

6. Click on spreadsheet cell C1. Grab the fill handle ( , the little square box that appears at the lower right corner of the C1 cell; when you point to it, the square becomes a cross) and drag it down until the first six cells in column C are highlighted. Then release the handle. Note that you have filled cells C1 through C6 with the formula that you typed in cell C1. For example, click cell C2; note in the text entry area just above the spreadsheet that =A2*B2 appears. Excel has applied the formula you typed in cell C1, which you wrote to apply to the values in the first row, to the values in the second row. 7. Highlight the six values in column C. Click ∑ AutoSum in the ribbon at the top of the Excel window. Note that the sum (176) appears in cell C7. That is ∑XY. Homework Tips 1. Check the list of learning objectives at the beginning of this chapter. Do you under- stand each one? 2. The hint given in Box 2.3 will in the long run save you a lot of computational time, and I strongly recommend it even if it seems at first like a waste of time. Almost all students are surprised and frustrated by how many computational errors they make. This procedure will help you avoid those costly errors. 3. Remember to include at least three extra significant figures in subcomputations. c 4. Be perfectly clear that [∑(X – 10)]2 is not the same as ∑(X – 10)2. If we follow This is the most pervasive the advice of Box 2.3, computing [∑(X – 10)]2 requires the creation of one new computational mistake in 2 introductory statistics. column headed “X – 10.” By contrast, computing ∑(X – 10) requires the creation of two new columns, the first headed “X – 10” and the second headed “(X – 10)2.” For practice, click Algebra and then Summation in the Personal Trainer. Personal Trainer Click ReviewMaster and then Chapter 2 in the Personal Trainer for an electronic ReviewMaster interactive review of the concepts in Chapter 2. Personal Trainer Click Labs and then Chapter 2 in the Personal Trainer for interactive practice of Labs the skills in Chapter 2 and a quiz to test your understanding.

ch02.indd 28 6/7/17 4:02 PM Chapter 2 Exercises 29

CHAPTER 2 Exercises Section A: Basic Exercises (b) Number of bar presses a rat makes (Answers in Appendix D, page 575) (c) Number of students in your statistics class (d) Fuel consumption (miles per gallon) 1. (a) Suppose I am interested in the effect of family size on life satisfaction. I code family size as 0 4. Consider the sample data given in the table. if four or fewer individuals live in the house- hold and 1 if more than four individuals live in i X the household. Does such a coding constitute a 1 6 measurement of family size? If not, why not? 2 1 3 9 (b) Halfway through my study, I find that I am get- 4 –5 ting too many 0 family sizes, so I change my 5 6 coding. I leave my original codings unchanged, 6 0 but I now code family size as 0 if three or 7 –3

fewer individuals live in the household and 1 (a) What value does X have? if more than three live in the household. Does 4 (b) What is n? such a procedure constitute measurement? If (c) Compute X. not, why not? ∑ (d) Create a column headed “X 2” as described in Exercise I worked out Box 2.3 and compute ∑X 2. (a) Measurement is a procedure that assigns a unique (e) Compute (∑X)2. and unambiguous value to each participant. This (f) For these data, does ∑X 2 = (∑X)2? coding does that, so the answer is yes. (g) Create a column headed “X + 10” and compute (b) A measurement operation must be the same for all ∑(X + 10). participants, but here four-member households are (h) For these data, does ∑(X + 10) = ∑X + 10? treated inconsistently. Therefore the answer is no. (i) Create a column headed “X 2 – 5” and compute ∑(X 2 – 5). 2. Ray Shoscale wants to know the level of measure- 5. For the data of Exercise 4, consider the sum ment (nominal, ordinal, or interval/ratio) for the  36X 2  i − . following variables. Help Ray out. ∑  (a) Weights of people (measured in pounds)  4  (b) The (measured as 1 = strongly dis­ (a) How many columns are required for this agree; 2 = disagree; 3 = neutral; 4 = agree; and computa­tion, following the suggestions of 5 = strongly agree) Box 2.3? What are their headings? (c) Basketball players’ uniform numbers (0, 1, 2, (b) Compute the sum. . . . , 55) 6. Use the data in the accompanying table. (d) Clothing sizes (small, medium, large, extra (a) Find X – Y . large) 3 3 (b) Compute ∑XY. (e) Shades of nail polish (21 = “shameless rose,” 25 = “pink in the afternoon,” 33 = “orchid,” XY 41 = “berry rich,” 88 = “nouveau nude,” etc.) 3 2 (f) Volumes of bottles (measured in fluid ounces) 4 4 7 1 3. Which of the following variables are discrete and 3 11 which are continuous? (a) Distance (in miles) from one city to another

ch02.indd 29 6/7/17 4:02 PM 30 Chapter 2 Variables and Their Measurement

7. For the data of Exercise 6, consider the sum (h) For these data, does ∑(X – 3) = ∑X – 3? 2 (i) Create a column headed “X 2 – 9” and compute  2()XY−  . ∑ ∑(X 2 – 9).  XY+  (a) How many columns are required for this 12. For the data of Exercise 11, consider the sum computa­tion, following the suggestions of  ()XX32()4  ii−+. Box 2.3? What are their headings? ∑   2  (b) Compute the sum. (a) How many columns are required for this 8. For the data of Exercise 6, demonstrate each computa­tion, following the suggestions of expres­sion. Box 2.3? What are their headings? (a) ∑X 2 ≠ (∑X)2 (b) Compute the sum. (b) ∑(X + 5) ≠ ∑X + 5 13. Consider the data in the accompanying table. (c) ∑(X + 5)2 ≠ (∑X + 5)2 (a) Find X – Y . (d) ∑XY ≠ ∑X ∑Y 4 4 (b) Compute ∑XY. 9. (a) F or the data of Exercise 6, how many decimal places should you use in reporting the value of XY (∑X)/3 as a final answer? 1.4 5 (b) What is that value? Be sure to round correctly. 4.1 –2 –3.8 1 10. Round each value to two decimal places. –6.0 –10 (a) 14.637 (f) 3.315 2.2 2 (b) –14.637 (g) 3.325 14. For the data of Exercise 13, consider the sum (c) 2.152 (h) 3.335  (2XY−+)(XY2) (d) 3.40500 (i) 27.43475  . ∑ XY+  (e) 3.40501 (a) How many columns are required for this Section B: Supplementary Exercises computa­tion, following the suggestions of Box 2.3? What are their headings? 11. Use the sample data in the table. (b) Compute the sum. 15. For the data of Exercise 13, demonstrate each i X i expres­sion. 1 4 2 2 2 –5 (a) ∑X ≠ (∑X) 3 2 (b) ∑(X + 5) ≠ ∑X + 5 4 –8 (c) ∑(X + 5)2 ≠ (∑X + 5)2 5 21 (d) ∑XY ≠ ∑X ∑Y 6 3 7 13 16. (a) For the data of Exercise 13, how many decimal 8 6 places should you use in reporting the value of (∑X)/7 as the final answer? (a) What value does X2 have? (b) What is n? (b) What is that value? Be sure to round correctly. (c) Compute ∑X. 17. Round each value to one decimal place. (d) Create a column headed “X 2” as described in (a) 13.55 (f) 3.14159 Box 2.3 and compute ∑X 2. (b) 13.65000 (g) .02 (e) Compute (∑X)2. (c) –4.1472 (h) 6.66 (f) For these data, does ∑X 2 = (∑X)2? (d) –3.5001 (i) 12.5 (g) Create a column headed “X – 3” and compute (e) –3.6001 ∑(X – 3).

ch02.indd 30 6/7/17 4:02 PM Chapter 2 Exercises 31

Section C: Cumulative Review (e) Greater than or equal to 75 (Answers in Appendix D, page 576) (f) Less than 10 (g) Contains “7” as one (or more) of its digits 18. Suppose you take the numbers 1 through 100 and (h) Has digits that add to 10 write each one on a poker chip; then you put all the poker chips in a bag and mix them thoroughly. You 19. (Exercise 18 continued) Suppose I tell you that draw one chip from the bag. What is the probability the first digit is a “3” and it is a two-digit number. that you draw a chip with the property given? What is the probability that the chip is “34”? (a) A “34” 20. (Exercise 18 continued) Suppose I tell you that the (b) A “129” chip is an even number. What is the probability that (c) An odd number the chip is “34”? (d) An even number

Personal Trainer Click Resources and then 2X in the Personal Trainer for additional exercises. Resources

ch02.indd 31 6/7/17 4:02 PM ch02.indd 32 6/7/17 4:02 PM