<<

Bivariate Analysis Correlation

Variable 1 ƒ Used when you measure two continuous variables. 2 LEVELS >2 LEVELS CONTINUOUS Variable 2 2 LEVELS X2 X2 t-test chi square test chi square test ƒ Examples: Association between weight & height. >2 LEVELS X2 X2 ANOVA Association between age & blood pressure chi square test chi square test (F-test) CONTINUOUS t-test ANOVA -Correlation (F-test) -Simple

Correlation Pearson's Correlation Coefficient

Weight (Kg) Height (cm) 55 170 200 ƒ Correlation is measured by Pearson's Correlation Coefficient. 93 180 190 90 168 180 ht

g ƒ A measure of the linear association between two 60 156 170

112 178 Hei variables that have been measured on a 160 continuous scale. 45 161 150 85 181 140 ƒ Pearson's correlation coefficient is denoted by r. 104 192 0 102030405060708090100110120 68 176 Weight 87 186 ƒ A correlation coefficient is a number ranges between -1 and +1.

Pearson's Correlation Coefficient Pearson's Correlation Coefficient

ƒ If r = 1 Î perfect positive linear relationship between the two variables.

ƒ If r = -1 Î perfect negative linear relationship between the two variables.

ƒ If r = 0 Î No linear relationship between the two r= +1 r= -1 r= 0 variables. Pearson's Correlation Coefficient Pearson's Correlation Coefficient

http://noppa5.pc.helsinki.fi/koe/corr/cor7.html

-0.9 0.8

0.2 -0.5

Pearson's Correlation Coefficient Pearson's Correlation Coefficient

Example 1: Moderate Moderate ƒ Research question: Is there a linear relationship between the weight and height of students?

ƒ Ho: there is no linear relationship between weight & -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 height of students in the population (p = 0)

ƒ Ha: there is a linear relationship between weight & StrongWeak Strong height of students in the population (p ≠ 0)

ƒ Statistical test: Pearson correlation coefficient (R)

Pearson's Correlation Coefficient Pearson's Correlation Coefficient

Example 1: SPSS Output Example 1: SPSS Output r Correlations Correlations coefficient weight height weight height weight Pearson Correlation 1 .651** weight Pearson Correlation 1 .651** Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1975 1954 N 1975 1954 height Pearson Correlation .651** 1 height Pearson Correlation .651** 1 Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1954 1971 N 1954 1971 **. Correlation is significant at the 0.01 level (2 il d) **. Correlation is significant at the 0.01 level ƒ Value of statistical test: 0.651 (2 il d)

P-Value ƒ P-value: 0.000 Pearson's Correlation Coefficient Pearson's Correlation Coefficient

Example 1: SPSS Output Example 2: SPSS Output Correlations Correlations weight age weight height weight Pearson Correlation 1 .155** weight Pearson Correlation 1 .651** Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1975 1814 N 1975 1954 age Pearson Correlation .155** 1 height Pearson Correlation .651** 1 Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1814 1846 N 1954 1971 **. Correlation is significant at the 0.01 level **. Correlation is significant at the 0.01 level (2 t il d) (2 il d) ƒ Conclusion: At significance level of 0.05, we reject null ƒ Research question: Is there a linear relationship between hypothesis and conclude that in the population there is the age and weight of students? significant linear relationship between the weight and height of students.

Pearson's Correlation Coefficient Pearson's Correlation Coefficient

Example 2: SPSS Output Example 2: SPSS Output

Correlations Correlations

weight age weight age weight Pearson Correlation 1 .155** weight Pearson Correlation 1 .155** Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1975 1814 N 1975 1814 age Pearson Correlation .155** 1 age Pearson Correlation .155** 1 Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1814 1846 N 1814 1846 **. Correlation is significant at the 0.01 level **. Correlation is significant at the 0.01 level (2 t il d) (2 t il d) p = 0 ; No linear relationship between weight & age ƒ H : ƒ Value of statistical test: 0.155 o in the population

ƒ P-value: 0.000 ƒ Ha: p ≠ 0 ; There is linear relationship between weight & age in the population

Pearson's Correlation Coefficient Pearson's Correlation Coefficient

Example 2: SPSS Output Example 3: SPSS Output

Correlations Correlations

weight age age height weight Pearson Correlation 1 .155** age Pearson Correlation 1 .084** Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1975 1814 N 1846 1812 age Pearson Correlation .155** 1 height Pearson Correlation .084** 1 Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1814 1846 N 1812 1971 **. Correlation is significant at the 0.01 level **. Correlation is significant at the 0.01 level (2 t il d) (2 t il d) ƒ Conclusion: At significance level of 0.05, we reject null ƒ Research question: Is there a linear relationship between hypothesis and conclude that in the population there is a the age and height of students? significant linear relationship between the weight and age of students. Pearson's Correlation Coefficient Pearson's Correlation Coefficient

Example 3: SPSS Output Example 3: SPSS Output

Correlations Correlations

age height age height age Pearson Correlation 1 .084** age Pearson Correlation 1 .084** Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1846 1812 N 1846 1812 height Pearson Correlation .084** 1 height Pearson Correlation .084** 1 Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1812 1971 N 1812 1971 **. Correlation is significant at the 0.01 level **. Correlation is significant at the 0.01 level (2 t il d) (2 t il d) p = 0 ; No linear relationship between height & age ƒ H : ƒ Value of statistical test: 0.084 o in the population

ƒ P-value: 0.000 ƒ Ha: p ≠ 0 ; There is linear relationship between height & age in the population

Pearson's Correlation Coefficient SPSS command for r

Example 3: SPSS Output Example 1 Correlations † Analyze age height „ Correlate age Pearson Correlation 1 .084** † Bivariate Sig. (2-tailed) .000 N 1846 1812 ƒ select height and weight and put it in the height Pearson Correlation .084** 1 “variables” box. Sig. (2-tailed) .000 N 1812 1971 **. Correlation is significant at the 0.01 level (2 t il d) ƒ Conclusion: At significance level of 0.05, we reject null hypothesis and conclude that in the population there is a significant linear relationship between the height and age of students.

In-class questions In-class questions

T (True) or F (False): T (True) or F (False):

In studying whether there is an association The correlation between obesity and number of between gender and weight, the investigator cigarettes smoked was r=0.012 and the p-value= found out that r= 0.90 and p-value<0.001 and 0.856. Based on these results we conclude that concludes that there is a strong significant there isn’t any association between obesity and correlation between gender and weight. number of cigarette smoked. Simple Linear Regression

ƒ Used to explain observed variation in the data ƒ In order to explain why BP of individual patients are different, we try to associate the differences in PB with differences in other relevant patient characteristics (variables).

ƒ For example, we measure blood pressure in a sample of patients and observe: ƒ Example: Can variation in blood pressure be explained by age? I=Pt# 1 2 3 4 5 6 7 Y= BP 85 105 90 85 110 70 115

Simple Linear Regression Simple Linear Regression

Questions: Mathematical properties of a straight line

ƒ Y= B0 + B1X 1) What is the most appropriate Y = dependent variable mathematical Model to use? X = independent variable A straight line, parabola, B = Y intercept etc… 0 B1= Slope 2) Given a specific model, how ƒ The intercept B is the value of Y when X=0. do we determine the best 0 fitting model? ƒ The slope B1 is the amount of change in Y for each 1-unit change in X.

Simple Linear Regression Simple Linear Regression

Estimation of a simple Linear Regression Model Example 1: ƒ Research Question: Does height help to predict weight ƒ Optimal Regression line = B + B X using a straight line model? Is there a linear relationship 0 1 between weight and height? Does height explain a significant portion of the variation in the values of weight observed? ƒ Y = B0 + B1X

ƒ Weight = B0 + B1 Height Simple Linear Regression Simple Linear Regression

ƒ SPSS output: Example 1 ƒ SPSS output (Continued): Example 1 ANOVAb b Variables Entered/Removed Sum of Model Squares df Square F Sig. Variables Variables 1 Regression 169820.3 1 169820.297 1435.130 .000a Model Entered Removed Method Residual 230982.0 1952 118.331 1 heighta . Enter Total 400802.3 1953 a. All requested variables entered. a. Predictors: (Constant), height b. Dependent Variable: weight b. Dependent Variable: weight

Model Summary Coefficientsa

Adjusted Std. Error of Unstandardized Standardized Model R R Square R Square the Estimate Coefficients Coefficients 1 .651a .424 .423 10.878 Model B Std. Error Beta t Sig. 1 (Constant) -95.246 4.226 -22.539 .000 a. Predictors: (Constant), height height .940 .025 .651 37.883 .000 a. Dependent Variable: weight

Simple Linear Regression Simple Linear Regression

ƒ SPSS output (Continued): Example 1 ƒ SPSS output (Continued): Example 1

Coefficientsa Model Summary Unstandardized Standardized Adjusted Std. Error of Coefficients Coefficients Model R R Square R Square the Estimate Model B Std. Error Beta t Sig. 1 .651a .424 .423 10.878 1 (Constant) -95.246 4.226 -22.539 .000 height .940 .025 .651 37.883 .000 a. Predictors: (Constant), height a. Dependent Variable: weight

Weight = B + B Height 0.424 Height explains 42.4% of the variation seen in 0 1 weight -95.246 0.940

Weight = -95.246 + 0.94 Height Increasing height by 1 unit (1 cm) increases weight by 0.94 Kg

Simple Linear Regression In-class questions

Question 1: Coefficientsa

Unstandardized Standardized Coefficients Coefficients In a simple linear regression model the predicted straight line Model B Std. Error Beta t Sig. 1 (Constant) -95.246 4.226 -22.539 .000 was as follows: height .940 .025 .651 37.883 .000 a. Dependent Variable: weight Weight (Kg) = 3.5 – 1.32 (weekly hours of PA) R2= 0.22; p-value for the slope= 0.04

ƒ H0: B1=0 ƒ Ha: B1≠0 What is the dependent/ independent variable? Dependent variable: Weight Independent Variable: Weekly hours of PA ƒ Because the p-value of the B1 is < 0.05; then reject H0 and conclude that height provides significant information for predicting weight. In-class questions In-class questions

Question 1: Question 1:

In a simple linear regression model the predicted straight line In a simple linear regression model the predicted straight line was as follows: was as follows:

Weight (Kg) = 3.5 – 1.32 (weekly hours of PA) Weight (Kg) = 3.5 – 1.32 (weekly hours of PA) R2= 0.22; p-value for the slope= 0.04 R2= 0.22; p-value for the slope= 0.04

Interpret the value of R2 What is the null hypothesis? Alternative?

Number of weekly hours of PA explain 22% of the variation H0: Bweekly hours of PA=0 observed in weight Ha: Bweekly hours of PA≠0

In-class questions In-class questions

Question 1: Question 1:

In a simple linear regression model the predicted straight line In a simple linear regression model the predicted straight line was as follows: was as follows: Weight (Kg) = 3.5 – 1.32 (weekly hours of PA) Weight (Kg) = 3.5 – 1.32 (weekly hours of PA) R2= 0.22; p-value for the slope= 0.04 R2= 0.22; p-value for the slope= 0.04

Is the association between weight & weekly hours of PA positive What is the magnitude of this association? or negative? 1.32 => One hour increase of PA in a week decreases Negative weight by 1.32 Kg.

In-class questions In-class questions

Question 1: Question 2:

Model Summary

In a simple linear regression model the predicted straight line Adjusted Std. Error of was as follows: Model R R Square R Square the Estimate a Weight (Kg) = 3.5 – 1.32 (weekly hours of PA) 1 .407 .166 .164 10.396 R2= 0.22; p-value for the slope= 0.04 a. Predictors: (Constant), ISS - injury severity measure

Coefficientsa Is the association significant at a level of 0.05? Unstandardized Standardized Coefficients Coefficients Model B Std. Error Beta t Sig. Because the p-value of the B1 is < 0.05; then reject H0 and 1 (Constant) .443 .747 .593 .554 conclude that weekly hours of PA provide significant ISS - injury .661 .066 .407 9.945 .000 information for predicting weight. severity measure a. Dependent Variable: Length of hospital stay In-class questions In-class questions

Question 2: Question 2: Coefficientsa Coefficientsa Model Summary Model Summary UnstandardizedStandardized UnstandardizedStandardized Adjusted Std. Error of Coefficients Coefficients Adjusted Std. Error of Coefficients Coefficients Model R R Square R Squarethe Estimate B Std. Error Beta t Sig. Model R R Square R Squarethe Estimate Mode B Std. Error Beta t Sig. 1 (Constant) .443 .747 .593 .554 1 (Constant) .443 .747 .593 .554 1 .407a .166 .164 10.396 1 .407a .166 .164 10.396 ISS - injury ISS - injury .661 .066 .407 9.945 .000 .661 .066 .407 9.945 .000 a. Predictors: (Constant), ISS - injury severity m severity meas a. Predictors: (Constant), ISS - injury severity m severity meas a.Dependent Variable: Length of hospital stay a.Dependent Variable: Length of hospital stay

What is the dependent/ independent variable? Interpret the value of R2 Dependent variable: Length of hospital stay ISS explains 40.7% of the variation observed in length of Independent Variable: ISS- Injury severity score hospital stay.

In-class questions In-class questions

Question 2: Question 2: Coefficientsa Coefficientsa Model Summary Model Summary UnstandardizedStandardized UnstandardizedStandardized Adjusted Std. Error of Coefficients Coefficients Adjusted Std. Error of Coefficients Coefficients Model R R Square R Squarethe Estimate Mode B Std. Error Beta t Sig. Model R R Square R Squarethe Estimate Mode B Std. Error Beta t Sig. 1 (Constant) .443 .747 .593 .554 1 (Constant) .443 .747 .593 .554 1 .407a .166 .164 10.396 1 .407a .166 .164 10.396 ISS - injury ISS - injury .661 .066 .407 9.945 .000 .661 .066 .407 9.945 .000 a. Predictors: (Constant), ISS - injury severity m severity meas a. Predictors: (Constant), ISS - injury severity m severity meas a.Dependent Variable: Length of hospital stay a.Dependent Variable: Length of hospital stay

What is the null hypothesis? Alternative? Is there a significant association between the dependent & the

H0: BISS=0 independent? H : B ≠0 a ISS Because the p-value of the BISS is < 0.05; then reject H0 and conclude that ISS provide significant information for predicting length of hospital stay.

In-class questions Biases

Question 2: Coefficientsa Model Summary UnstandardizedStandardized Adjusted Std. Error of Coefficients Coefficients Model R R Square R Squarethe Estimate Mode B Std. Error Beta t Sig. 1 (Constant) .443 .747 .593 .554 1 .407a .166 .164 10.396 ISS - injury .661 .066 .407 9.945 .000 a. Predictors: (Constant), ISS - injury severity m severity meas Selection Information a.Dependent Variable: Length of hospital stay bias bias bias What is the magnitude of this association? 0.661 => Increasing ISS by 1 unit increases length of hospital stay by 0.661 days. Bias is an error in an epidemiologic study that results in an incorrect estimation of the association between exposure and outcome. Biases Confounding Bias: Definition

Is present when the association between an exposure and an outcome is distorted by an extraneous third variable (referred to a Selection Information Confounding confounding variable). bias bias bias

Confounding Bias: Example Confounding Bias: Minimize bias

Example : Study the association between coffee ƒ Research Design: drinking and lung cancer Ž Use of randomized LC Ž Restriction Yes No ƒ Data Analysis: Coffee Coffee Yes 80 15 Ž Multivariate statistical techniques No 20 85

OR= (80x 85)/ (15 x 20)= 22

What would you conclude????

Bivariate Analysis Multivariate analyses

Variable 1 2 LEVELS >2 LEVELS CONTINUOUS Logistic Regression Multiple Linear Regression Variable 2 (If outcome is 2 levels) (If outcome is continuous) 2 LEVELS X2 X2 t-test chi square test chi square test

>2 LEVELS X2 X2 ANOVA chi square test chi square test (F-test) Multivariate Analysis is used for adjusting for CONTINUOUS T-test ANOVA -Correlation confounding variables. (F-test) -Simple linear Regression Multivariate Analysis Multivariate analyses

WHY? Logistic Regression Multiple Linear Regression ƒ To investigate the effect of more than one (If outcome is 2 levels) (If outcome is continuous) independent variable.

ƒ Predict the outcome using various independent variables.

ƒ Adjust for confounding variables