I. Descriptive (3) Correlational studies

Correlation Relationship between variables Studies in which two or more variables are Scatterplots (Scatter Diagrams) measured to find the direction and degree to Measuring Correlation: which they covary. • The : r Covary: Two variables covary when a change in one variable is related to a consistent change in the other variable

Relationship between variables Scatterplots (Scatter Diagrams)

Postive relationship Bivariate distribution: A relationship between two variables in which, as the value of one variable increases, the value of the other A distribution in which two scores are obtained variable tends to increase also from each subject Negative relationship A relationship between two variables in which, as the value of one variable increases, the value of the other variable tends to decrease Scatterplot: No relationship A graph of a bivariate distribution in which the Lack of relationship. No regularity among the pairs of X variable is plotted on the horizontal axis and values of the variables the Y variable is plotted on the vertical axis

Scatterplots (Scatter Diagrams) Scatterplots (Scatter Diagrams)

Postive correlation Negative correlation

1 Scatterplots (Scatter Diagrams) Scatterplots (Scatter Diagrams)

No correlation Linear relationship A relationship between two variables that can be described by a straight line

Curvilinear relationship A relationship between two variables that can be described best with a curved line

Measuring Correlation Measuring Correlation

Correlation Coefficient Pearson Correlation_ _ Coefficient r = (x - X)(y - Y) / (n s s ) A number between –1 and 1 that describes the ∑ x y r = ∑ z z / n (z score formula) relationship between pairs of variables x y r = (n ∑xy - ∑x ∑y) / [√(n ∑x2 – (∑x)2 ) √(n ∑y2 – (∑y)2 )] Pearson Correlation Coefficient (computational formula) A , symbolized by r, that indicates the What does r ? degree of linear relationship between two r is a type of mean: the mean of the products of paired z variables measured at the interval or ratio level scores -1 ≤ r ≤ 1 Based on a measure of covariation: Cross Products The value of r is a measure of how well a straight line describes the cluster of dots in a scatterplot

Measuring Correlation I. Descriptive Statistics (4)

Regression But keep in mind that, … a high correlation Regression Line and Predictive Errors does not mean that there is a cause-effect Regression Line relation! Least Squares Regression Equation Experimentation is needed! of What is r2?

2 Regression: Building on Regression Line and Correlation Predictive errors

Prediction (regression) vs. relation (correlation) When a bivariate distribution shows a linear relationship, it (Simple) : is sometimes useful to try to Statistical tool used to predict scores on the dependent predict X from Y using a variable from scores on (one) independent variable regression line. This line is conceived as an approximation to the cloud of observed in the scatterplot.

Regression Line and Equation for a line Predictive errors

Slope: The amount that Y is predicted to increase for an Predictive error: It is the difference, for each X, between increase of 1 in X. the observed corresponding Y and the value of the Y- Y-intercept: the predicted value for Y when X is 0 (point at coordinate. which the line intercepts the y-axis) The position of the regression line should minimize the total predictive error.

y = 2x + 5

Least Squares Regression line Least Squares equation

least squares regression line: The least squares regression equation minimizes the total the prediction line that minimizes the total of all squared prediction squared predictive error errors for known Y scores in It has the form: y = mx + n the original correlation analysis. Yˆ = bX + a

The slope is: b = r(Sy/Sx_) _ The Y-intercept is: a = Y – bX Assumptions: Linearity and

3 Least Squares equation Standard error of prediction

The least squares regression equation minimizes the total How to determine the amount of error associated of all squared prediction errors for known Y scores in the with these ? original correlation analysis. Standard error of prediction (or Standard error of Yˆ = bX + a the estimate): The slope is: b = r(Sy/S_x) _ The Y-intercept is: a = Y – bX A statistic that indicates the typical distance between a regression line and the actual data points Where do the values for a and b come from? And what do they mean? (see pdf notes)

Standard error of prediction What is r2?

The squared correlation coefficient (r2): Standard error of prediction (or Standard error of is the proportion of total in one the estimate): variable that is predictable from its relationship S = S (1 – r2) y|x y √ with the other variable It is a (rough) measure of the amount of predictive error by which known Y values deviate from provides a measure of the worth of least predicted Yˆ values squares as predictors It reflects the degree to which the points diverge from the regression line. 2 It reflects the accuracy of the prediction. r = (SStotal – SSerror ) / SStotal

4