Scatterplots SPSS Interactive Graphs, Correlation & Regression

Scatterplots— SPSS v.20 Legacy Graphs, Correlation & Regression Math 151

Many of the editing techniques work in other graph types too.

Data set govsal_vs_pay.sav: download from SPSSinfo or
Day12,13,14webpages, get from SPSS for class BPS5e folder

CREATE

Graphs>Legacy Dialogs>Scatter/Dot
Simple Scatter. Define. Move variables from the lefthand list to the X-axis and Y-axis boxes.
(Like Average Pay à X, Gov. Salary à Y.) OK.

If you have a labeling variable (like names of states), move it to Label Cases by box (Need this to do Labeling, shown below!).

Subgroups (BPS5e Ch4pp103-4 “Adding Categorical variables”) In the Simple Scatterplot dialog:

On the same graph: In the Scatterplot dialog, move the grouping variable to the Set Markers by box.
(Get different Colors-- best on screen or color printer. To get different Marker Types--best for black/white printer--see bottom of this page, or p. 2.)

Each on its own graph: Drag grouping variable to a Panel by box.

EDIT APPEARANCE (Double Click on graph, then

Chart Editor window opens. (To leave it, do File>Close.
Edit only 1 graph at a time!) In Chart Editor:

To Label one or more individual points: (New York, top graph)

(First: When making scatterplot, put the labeling variable in Label Cases by box!)

Click on a point, till all are outlined in gold.

Do Elements>Show Data Labels. Labels appear, and a Properties window will open. To change labels: Be in Data value Labels tab: If the labeling variable (State) is not in the Displayed box, find it in the lower box, move it to the Displayed box (drag it, or select it & use green arrow.) Remove other variables from the Displayed box using the red X. Apply. (I used Fill and Border tab to make both transparent: white box with red slash.)

Now everything is labeled—too much!

De-label all points: In main Chart Editor menu,
Elements>Hide Data Labels. Can’t? Click on a label till all labels are outlined. Menu should work now. (Note the Properties box changes.)

Now Label an individual point:(May work without above.) Elements> Data Label Mode. Or click on the gunsight icon, leftmost on the lowest toolbar. The mouse icon turns into the Data labeler (gunsight). Use it to click on & label (or de-label) points at will. (To turn it off labeler, click on toolbar icon or re-do Elements> Data Label Mode)

Editing various chart elements: if you click on an element of the graph (axis label, points, title…), it will be selected--outlined in gold. Edit> Properties or Ctrl-T opens the appropriate Properties window (if it’s not already) and you can change how it looks. For example,

Change marker style and color: (for instance, you want all black, different symbols for each Region, as shown top of page.) To get different shape markers, click on a dot to select all the dots. Get Properties window
(Edit> Properties or Ctrl-T if needed); find Variables tab. Make Element Type be Marker. In the big box, find the grouping variable (region). To the right it says Group, under it Style: Color. Click on Color, pull down the menu and choose Style:Shape from the list. Apply.

(I didn’t love the marker styles it chose this way: see p.2 for another way.)


Drawing Linear Regression line on graph:

In Chart Editor, do Elements> Fit line atTotal. Apply (if the line doesn’t appear automatically).
The line appears in the graph; also a legend
“R2 Linear = 0.212” This is “r2” (BPS5e p. 133-4). (To eliminate the line, Click to gold-outline (select) it. Edit>Delete (bottom of Edit menu))

For a line for each subgroup, do Elements> Fit Line at Subgroups. (Apply). If you are using color to differentiate the dots, the lines will be in matching colors.

If the dots are already black, by shape, the lines will be in different dashes. How to distinguish? Legend is unclear. Still in Chart Editor, click on line till you have just one outlined. The region marker will be outlined. Take notes, label picture by hand. Or,

To get different dashed lines directly, select a line or lines. Get Properties window (Edit> Properties); find Variables tab. Make Element Type be Fit Line. In the big box, find the grouping variable (region). To the right find Group, under it Style: Color. Click on Color, pull down the menu (scroll down if needed), & pick Style:Dash from the list. Apply. à

If you have each subgroup on its own graph, Elements> Fit line at Total (Apply) will put in the correct line for each little graph.

“Smoother” (non-straight) fit line (similar to idea of ex 4.32 p,118): After putting in the Fit Line as above, in its Properties window, choose Loess. Apply.

======

In Chart Editor, more: Flaky in v. 20. Make a copy of your graph before proceeding. (Learn more? Try Help at the bottom of Properties window tabs.)

Editing various chart elements: click on an element of the graph (axis label, points, title…), to select it--outline in gold. Edit> Properties or Ctrl-T opens the appropriate Properties window (if it’s not already) and you can change how the element looks.

Resize a chart (like the panels, p.1), click outside the border to select the whole thing. Properties window: Chart size tab: uncheck Maintain aspect ratio box to make wider/taller, then use buttons to change Height/Width. (You can also do this back in the main scroll, dragging on border.)

Getting rid of an element. Click to outline (select) it. Edit>Delete (bottom of Edit menu).
Delete option Not there? Look on menus for a Hide… option.

Change axis limits (e.g. to show (0,0)) select the numbers on the axis, open Properties window (if it’s not already), Scale tab: Enter the desired Min, Max, and/or Major Increment (distance between numbers). Labels and Ticks tab, bottom: Minor Ticks (number of) to Display, is useful for making more gridlines (next):

Add gridlines "graph paper" look: Options>Show Grid Lines. To change looks, open Properties window (if it’s not already). Grid lines tab allows you to change how many lines. Lines tab to change looks. You can Show/Hide & edit lines for each axis separately by selecting the numbers on the axis, then Options>Show Grid Lines.

Want numbers on axes to have commas (40,000 not 40000)? Back in Data Editor!, change Type from Numerical to Comma. Make graph.

Exclude a subgroup from the graph: click on a dot to select the data points, and open the Properties window; find Categories tab. Choose the categories you want to exclude in the top (Order) box and hit the big red X to demote them to the bottom (Excluded) box. Apply. (You can bring them back with the green arrow.)

I didn’t love the marker styles it chose (p.1): here’s another way. In the Region list to the right of the graph, Click on the circle next to the first category, MW. All the MW state dots should get gold rings (are selected). (If the Properties window isn’t open, Edit> Properties. ) In Properties window, Marker tab, change the Color, Border, by clicking in the Border square, then on the desired color on the right. (you can leave the Fill transparent.). Change the Marker, Type: pull down the Type menu and choose one. Apply and see that the graph changes. Back in the graph, click on the circle next to the next category, NE. Repeat changing the color and marker (a different marker). Repeat for the other groups.

If menu choices go gray, close SPSS and open it again (Brain overload).
Bugs abound. Save often! Edit>Undo (Ctrl-z) repeatedly sometimes will rescue.

Duplicate methods exist: Right-click gets a motley menu. Try buttons on toolbars.


Residuals: (BPS5e pp135-9)

Analyze>Regression>Linear:

Move Y (Response) variable to Dependent box,

X (Explanatory) variable to Independent box.

Now…one (or both) of these:

For Graph: Plots Button: Click *ZRESID across to Y, *ZPRED to X. Continue. OK. Will give this à

Note: If your regression has negative slope, this residual plot has the points going backward from their original plotted direction. Can be hard to interpret. Also values on both axes are in z’s—standard deviations from the mean.

To create a variable containing the residuals:
Save button: Residuals, check Unstandardized. Continue. OK.

If you Save the residuals in this way, they appear in your data set as a new variable, RES_1. You can graph this against your original X(independent)-variable, using our usual Scatterplot.

Graphs>Legacy Dialogs>Scatter/Dot > Simple Scatter. Define.
Move Average Pay à X-axis,
Unstandardized Residual à Y-axis box. OK. Get à

Draw the horizontal line at 0 by hand on your paper, or, in the Chart Editor , do Options> Y Axis Reference Line.
In the Properties window, Reference Line tab, type a 0 in the Position box. Apply.

======

Seeing Deviations in the original graph: à
(This is NOT a “plot of the residuals”!)

Each residual is the vertical distance (deviation) from the observed value to the predicted (line) value (BPS5e p.129, 137). If the observed value is above the line, the residual is positive, and vice versa. We can picture the residuals (deviations) by running vertical lines from the points to the line.

Get your scatterplot and fit line (to whole dataset) as usual (p.1&2 here). Still in the Chart Editor: The Properties window, Fit line tab, should be open. (No? double click on the line to get the correct Properties window.) Check in the Display Spikes box, top left. Apply.
(It Took 3+ versions of SPSS to unbreak this option)

The “least-squares regression line” is what we use to fit the data set—it is the straight line that minimizes the sum of the squares of the deviations (residuals) shown in this graph.


Correlation and Regression in SPSS

Data set Govsal_vs_pay.sav: download from Day webpages or SPSS Info, get from SPSS forClass folder.

Correlation coefficient “r”: (BPS5e Ch4, pp. 104-11) Analyze>Correlate>Bivariate
(Pearson box checked) Move both variables into the right-hand box. OK. Get "Correlation matrix"

r = .460

(Correlations for subgroups: You need to select the subgroup in the Data, in the Data Editor, (Big First Handout, p.5 bottom) using

Data>Select Cases>If…, then do above, for each subgroup.)

Formula of Regression line (BPS5e Ch5 pp.125-34)

Analyze>Regression>Linear: Move Y (Response) variable to Dependent box, X (Explanatory) variable to Independent box. For just the formula stuff, OK.

Model Summaryb /
Model /
R / R Square / Adjusted R Square / Std. Error of the Estimate /
1 / .460a / .212 / .195 / 16208.048 /
a. Predictors: (Constant), Average Pay ($) 1993 /
b. Dependent Variable: Governor's salary ($)

Find these tables in the output:

R-Square is “r-squared” (r2) à
(BPS5e p. 133-4). Watch out: R is the correlation coefficient r but with any minus sign missing!

Line formula: = a + bx

= 28569.694 + 2.709x, or

GovSal = 28569.694 + 2.709AvgPay

Coefficientsa /
Model a
x b (slope)
y / Unstandardized
Coefficients / Standardized Coefficients / t / Sig. /
B / Std. Error / Beta /
1 / (Constant) / 28569.694 / 16716.638 / 1.709 / .094 /
Average Pay ($) 1993 / 2.709 / .754 / .460 / 3.592 / .001 /
a. Dependent Variable: Governor's salary ($)

Read off constant a and slope b from
B column of Coefficients table à

Formula of line for a subgroup: as before,

Analyze>Regression>Linear: Move Y (Response) variable to Dependent box, X (Explanatory) variable to Independent box. Move variable which defines subgroups to Selection Variable box. Hit Rule button. Get à
For just the Northeast states, put NE in the box. The value must be typed exactly as it is in the dataset. Continue. OK.

New results are now just for states in NE. (More subgroups? Repeat whole thing for each subgroup. You can also exclude one subgroup: equal to box pulls down to offer not equal to.)

Graphing the line(s) on the scatterplot: p. 2, top

4 SPSSScatter151F12.doc 9/17/12