Bivariate Statistics Introducing Mathematics
Total Page:16
File Type:pdf, Size:1020Kb
Barry Kissane Module 7 Bivariate statistics Statistical data analysis requires technological support for efficiency and you will find that the ClassWiz supports bivariate statistics well. As for the previous module, we will use Statistics mode, focusing on bivariate statistics. Many of the calculator operations are similar to those used for univariate statistics. Getting started with bivariate statistics Bivariate statistics involves data with two variables, so that interest is generally on the relationship between the two variables. The calculator assumes that the variables are named x and y respectively. Start by entering Statistics mode with w2. Each of the choices (except the first one) involves bivariate statistics. The choices refer to seven different models for representing the relationship between the variables. To begin with select 2, which allows us to explore a linear relationship in the form of y = a + bx. Although it is rare, bivariate data sometimes have associated frequencies, so that each data pair might be repeated several times. We will assume for now that frequencies are not involved. To turn the frequencies information off, use SET UP, select the second page with R and then tap 1 Statistics. Select 2 Off and you will then see a blank data table for x and y as shown in the screen above. Entering, editing and checking data The calculator will allow up to 80 data points for each of two variables to be entered. If you have more than 80 pairs of data, you will need to use frequencies. To illustrate the use of the calculator for bivariate analysis, consider the data below. Nurses in a school were checking children’s pulses and wanted to know whether good readings could be obtained after only 15 seconds, as they expected this would save them a lot of time. So they measured the number of heartbeats of a group of 14 children for 15 seconds and then measured the number of heartbeats again for 60 seconds. They obtained the following results: x (15 secs pulse) 14 16 12 15 13 19 14 25 22 23 24 17 20 18 y (60 secs pulse) 57 65 43 59 41 75 51 92 84 87 86 58 70 68 It is efficient to enter these measurements into the calculator in the x-column first, tapping the = key after each one. Notice that, after you tap =, the cursor moves down to the next row of the table, and stores a 0 in the y column, as shown below. Then enter the y-values in the same way, after first moving the cursor using $ and E. © 2016 CASIO COMPUTER CO., LTD. CASIO Worldwide Education website http://edu.casio.com/ 7-1 Module 7: Bivariate statistics Typing errors are always possible, especially if a large number of data points are entered, so that it is wise to check the entries. If you make an error in entering a value before tapping the = key, you can correct it immediately using the o key, enter the correct value and then tap =. If you notice an error in an entered data point, use the cursor to highlight the incorrect point and retype it with the correct value. As the data are entered, you can scroll up and down using E and R or left and right using ! and $. You can scroll in either direction and, in particular, can scroll down from the bottom value to the top value, or scroll up from the top value to the bottom value (as if the data were in a loop). As you scroll, you will see that highlighted values are shown in greater detail and size at the bottom of the screen than they are in the table, as with Table mode in the calculator. An easy check on data entry involves the number of entries. In this case, the final data pair of (18,68) is marked as the 14th point, which matches the number of entries in the data table above. If you wish to enter a new data pair, while the data table is showing, tap T2Editor. Once all data are entered, tap C and T to enter the Statistics menu, the first page of which is shown below. (Notice that this menu is different from the one obtained if you select T when the data table is showing.) It is a good idea to check the maximum and minimum entries, since these are often of interest, allow you to calculate the range but can sometimes represent typing errors. To access these from the OPTN menu, above, tap 2 2-Variable Calc and use R to go down to the fourth page. All four minimum and maximum values for x and y are shown below. Each of these four values is correct in this case. To correct any errors, tap C and then T4 Data to return to the data table for suitable editing. Retrieving statistics It is always important to check data accuracy before undertaking any statistical analysis. Once you are sure that data have been entered correctly, appropriate statistics can be obtained via the OPTN menu. Tap C to leave the data table and T to display the menu. Four pages of statistics are then available with 2 2-Variable Calc (the fourth page is shown above) Introducing Mathematics with ClassWiz 7-2 Barry Kissane The second page of the OPTN menu allows you to examine and use individual statistics. The mean of the shorter (15-second) pulses, represented byx, is available by tapping 2 Variable and then 1=. The mean of the longer (60-second) pulses can be obtained by first returning to the Variable menu in the second page of the OPTN menu and then tapping 7=: You can still use the calculator while in Statistics mode. For example, to calculate one quarter of the mean of the longer pulses immediately after obtaining the results above, use the replay key with ! and edit the expression, as shown below: Although this is close to the mean of the 15-second pulses, it is a little smaller than that mean, perhaps suggesting that the shorter readings are a little higher than might be expected. As for univariate statistics, there are two measures of standard deviation available for each variable, with measuring the population standard deviation and s providing an estimate for a sample, as explained in Module 5. In general, s is a little larger than : As well as these statistics, sometimes it is helpful to obtain the sums of the original scores and the sums of their squares, as these are the values used internally by the calculator to undertake the calculations for the standard deviation. These are shown below, obtained by first tapping 1 Summation in the second page of the OPTN menu: Relationships between some of these statistics and the calculation of variances and means were described briefly in Module 5; these various sums are used internally in the calculator for calculations. However, most people are generally comfortable with allowing the calculator to complete the computations, and do not make use of these statistics directly. Using a linear model The major reason for studying two variables at once is to understand the relationship between them. There are various kinds of relationships that the calculator allows you to explore. The most important of these involves a linear model of the form y = a + bx. (This is often written in schools in the form y = gradient x x + intercept; i.e. y = bx + a.) At the start of this module, you chose this © 2016 CASIO COMPUTER CO., LTD. 7-3 Module 7: Bivariate statistics model in the opening screen for Statistics mode. The calculator provides the best-fitting model of this kind for the data entered, by providing values for a and b. To access these, tap T to display the OPTN menu for Statistics and then 3 to display the Regression calculations: So, the best-fitting linear model for these data (rounded to two decimal places) is y = -0.15 + 3.72x or y = 3.72x – 0.15 This is close to, but not quite the same as might be expected for the pulses that y = 4x, based on an assumption that the number of beats in 60 seconds would be four times the number in 15 seconds. The second page of the OPTN menu allows you to access individual regression statistics, through 4Regression. The calculator allows you to use the linear model automatically to predict y values for particular x values. For example, to predict the number of beats (y) in 60 seconds when the number of beats in 15 seconds is x = 30, enter 30 and then the yˆ command in the Regression menu, followed by =: (The caret mark over the y refers to an estimated value.) So the linear model predicts about 112 beats in 60 seconds. It is also possible to automatically use the linear model to predict an x value associated with a particular y value. For example, for y = 100, the xˆ command predicts that x ≈ 27 beats: The calculator also provides a measure of how closely aligned the data are to the model studied. The statistic used is the correlation coefficient, represented by the symbol r, accessed in the Regression menu. The value always lies between -1 and 1, each of which represents a perfect fit to the model. In this case, the linear model is a good fit to the data, since r is very close to 1: Introducing Mathematics with ClassWiz 7-4 Barry Kissane It is always a good idea to examine bivariate data visually, using a scatter plot, in order to see what is the apparent relationship between the variables.