Bivariate Analysis Correlation
Variable 1 Used when you measure two continuous variables. 2 LEVELS >2 LEVELS CONTINUOUS Variable 2 2 LEVELS X2 X2 t-test chi square test chi square test Examples: Association between weight & height. >2 LEVELS X2 X2 ANOVA Association between age & blood pressure chi square test chi square test (F-test) CONTINUOUS t-test ANOVA -Correlation (F-test) -Simple linear Regression
Correlation Pearson's Correlation Coefficient
Weight (Kg) Height (cm) 55 170 200 Correlation is measured by Pearson's Correlation Coefficient. 93 180 190 90 168 180 ht
g A measure of the linear association between two 60 156 170
112 178 Hei variables that have been measured on a 160 continuous scale. 45 161 150 85 181 140 Pearson's correlation coefficient is denoted by r. 104 192 0 102030405060708090100110120 68 176 Weight 87 186 A correlation coefficient is a number ranges between -1 and +1.
Pearson's Correlation Coefficient Pearson's Correlation Coefficient
If r = 1 Î perfect positive linear relationship between the two variables.
If r = -1 Î perfect negative linear relationship between the two variables.
If r = 0 Î No linear relationship between the two r= +1 r= -1 r= 0 variables. Pearson's Correlation Coefficient Pearson's Correlation Coefficient
http://noppa5.pc.helsinki.fi/koe/corr/cor7.html
-0.9 0.8
0.2 -0.5
Pearson's Correlation Coefficient Pearson's Correlation Coefficient
Example 1: Moderate Moderate Research question: Is there a linear relationship between the weight and height of students?
Ho: there is no linear relationship between weight & -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 height of students in the population (p = 0)
Ha: there is a linear relationship between weight & StrongWeak Strong height of students in the population (p ≠ 0)
Statistical test: Pearson correlation coefficient (R)
Pearson's Correlation Coefficient Pearson's Correlation Coefficient
Example 1: SPSS Output Example 1: SPSS Output r Correlations Correlations coefficient weight height weight height weight Pearson Correlation 1 .651** weight Pearson Correlation 1 .651** Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1975 1954 N 1975 1954 height Pearson Correlation .651** 1 height Pearson Correlation .651** 1 Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1954 1971 N 1954 1971 **. Correlation is significant at the 0.01 level (2 il d) **. Correlation is significant at the 0.01 level Value of statistical test: 0.651 (2 il d)
P-Value P-value: 0.000 Pearson's Correlation Coefficient Pearson's Correlation Coefficient
Example 1: SPSS Output Example 2: SPSS Output Correlations Correlations weight age weight height weight Pearson Correlation 1 .155** weight Pearson Correlation 1 .651** Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1975 1814 N 1975 1954 age Pearson Correlation .155** 1 height Pearson Correlation .651** 1 Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1814 1846 N 1954 1971 **. Correlation is significant at the 0.01 level **. Correlation is significant at the 0.01 level (2 t il d) (2 il d) Conclusion: At significance level of 0.05, we reject null Research question: Is there a linear relationship between hypothesis and conclude that in the population there is the age and weight of students? significant linear relationship between the weight and height of students.
Pearson's Correlation Coefficient Pearson's Correlation Coefficient
Example 2: SPSS Output Example 2: SPSS Output
Correlations Correlations
weight age weight age weight Pearson Correlation 1 .155** weight Pearson Correlation 1 .155** Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1975 1814 N 1975 1814 age Pearson Correlation .155** 1 age Pearson Correlation .155** 1 Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1814 1846 N 1814 1846 **. Correlation is significant at the 0.01 level **. Correlation is significant at the 0.01 level (2 t il d) (2 t il d) p = 0 ; No linear relationship between weight & age H : Value of statistical test: 0.155 o in the population
P-value: 0.000 Ha: p ≠ 0 ; There is linear relationship between weight & age in the population
Pearson's Correlation Coefficient Pearson's Correlation Coefficient
Example 2: SPSS Output Example 3: SPSS Output
Correlations Correlations
weight age age height weight Pearson Correlation 1 .155** age Pearson Correlation 1 .084** Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1975 1814 N 1846 1812 age Pearson Correlation .155** 1 height Pearson Correlation .084** 1 Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1814 1846 N 1812 1971 **. Correlation is significant at the 0.01 level **. Correlation is significant at the 0.01 level (2 t il d) (2 t il d) Conclusion: At significance level of 0.05, we reject null Research question: Is there a linear relationship between hypothesis and conclude that in the population there is a the age and height of students? significant linear relationship between the weight and age of students. Pearson's Correlation Coefficient Pearson's Correlation Coefficient
Example 3: SPSS Output Example 3: SPSS Output
Correlations Correlations
age height age height age Pearson Correlation 1 .084** age Pearson Correlation 1 .084** Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1846 1812 N 1846 1812 height Pearson Correlation .084** 1 height Pearson Correlation .084** 1 Sig. (2-tailed) .000 Sig. (2-tailed) .000 N 1812 1971 N 1812 1971 **. Correlation is significant at the 0.01 level **. Correlation is significant at the 0.01 level (2 t il d) (2 t il d) p = 0 ; No linear relationship between height & age H : Value of statistical test: 0.084 o in the population
P-value: 0.000 Ha: p ≠ 0 ; There is linear relationship between height & age in the population
Pearson's Correlation Coefficient SPSS command for r
Example 3: SPSS Output Example 1 Correlations Analyze age height Correlate age Pearson Correlation 1 .084** Bivariate Sig. (2-tailed) .000 N 1846 1812 select height and weight and put it in the height Pearson Correlation .084** 1 “variables” box. Sig. (2-tailed) .000 N 1812 1971 **. Correlation is significant at the 0.01 level (2 t il d) Conclusion: At significance level of 0.05, we reject null hypothesis and conclude that in the population there is a significant linear relationship between the height and age of students.
In-class questions In-class questions
T (True) or F (False): T (True) or F (False):
In studying whether there is an association The correlation between obesity and number of between gender and weight, the investigator cigarettes smoked was r=0.012 and the p-value= found out that r= 0.90 and p-value<0.001 and 0.856. Based on these results we conclude that concludes that there is a strong significant there isn’t any association between obesity and correlation between gender and weight. number of cigarette smoked. Simple Linear Regression Simple Linear Regression
Used to explain observed variation in the data In order to explain why BP of individual patients are different, we try to associate the differences in PB with differences in other relevant patient characteristics (variables).
For example, we measure blood pressure in a sample of patients and observe: Example: Can variation in blood pressure be explained by age? I=Pt# 1 2 3 4 5 6 7 Y= BP 85 105 90 85 110 70 115
Simple Linear Regression Simple Linear Regression
Questions: Mathematical properties of a straight line
Y= B0 + B1X 1) What is the most appropriate Y = dependent variable mathematical Model to use? X = independent variable A straight line, parabola, B = Y intercept etc… 0 B1= Slope 2) Given a specific model, how The intercept B is the value of Y when X=0. do we determine the best 0 fitting model? The slope B1 is the amount of change in Y for each 1-unit change in X.
Simple Linear Regression Simple Linear Regression
Estimation of a simple Linear Regression Model Example 1: Research Question: Does height help to predict weight Optimal Regression line = B + B X using a straight line model? Is there a linear relationship 0 1 between weight and height? Does height explain a significant portion of the variation in the values of weight observed? Y = B0 + B1X
Weight = B0 + B1 Height Simple Linear Regression Simple Linear Regression
SPSS output: Example 1 SPSS output (Continued): Example 1 ANOVAb b Variables Entered/Removed Sum of Model Squares df Mean Square F Sig. Variables Variables 1 Regression 169820.3 1 169820.297 1435.130 .000a Model Entered Removed Method Residual 230982.0 1952 118.331 1 heighta . Enter Total 400802.3 1953 a. All requested variables entered. a. Predictors: (Constant), height b. Dependent Variable: weight b. Dependent Variable: weight
Model Summary Coefficientsa
Adjusted Std. Error of Unstandardized Standardized Model R R Square R Square the Estimate Coefficients Coefficients 1 .651a .424 .423 10.878 Model B Std. Error Beta t Sig. 1 (Constant) -95.246 4.226 -22.539 .000 a. Predictors: (Constant), height height .940 .025 .651 37.883 .000 a. Dependent Variable: weight
Simple Linear Regression Simple Linear Regression
SPSS output (Continued): Example 1 SPSS output (Continued): Example 1
Coefficientsa Model Summary Unstandardized Standardized Adjusted Std. Error of Coefficients Coefficients Model R R Square R Square the Estimate Model B Std. Error Beta t Sig. 1 .651a .424 .423 10.878 1 (Constant) -95.246 4.226 -22.539 .000 height .940 .025 .651 37.883 .000 a. Predictors: (Constant), height a. Dependent Variable: weight
Weight = B + B Height 0.424 Height explains 42.4% of the variation seen in 0 1 weight -95.246 0.940
Weight = -95.246 + 0.94 Height Increasing height by 1 unit (1 cm) increases weight by 0.94 Kg
Simple Linear Regression In-class questions
Question 1: Coefficientsa
Unstandardized Standardized Coefficients Coefficients In a simple linear regression model the predicted straight line Model B Std. Error Beta t Sig. 1 (Constant) -95.246 4.226 -22.539 .000 was as follows: height .940 .025 .651 37.883 .000 a. Dependent Variable: weight Weight (Kg) = 3.5 – 1.32 (weekly hours of PA) R2= 0.22; p-value for the slope= 0.04
H0: B1=0 Ha: B1≠0 What is the dependent/ independent variable? Dependent variable: Weight Independent Variable: Weekly hours of PA Because the p-value of the B1 is < 0.05; then reject H0 and conclude that height provides significant information for predicting weight. In-class questions In-class questions
Question 1: Question 1:
In a simple linear regression model the predicted straight line In a simple linear regression model the predicted straight line was as follows: was as follows:
Weight (Kg) = 3.5 – 1.32 (weekly hours of PA) Weight (Kg) = 3.5 – 1.32 (weekly hours of PA) R2= 0.22; p-value for the slope= 0.04 R2= 0.22; p-value for the slope= 0.04
Interpret the value of R2 What is the null hypothesis? Alternative?
Number of weekly hours of PA explain 22% of the variation H0: Bweekly hours of PA=0 observed in weight Ha: Bweekly hours of PA≠0
In-class questions In-class questions
Question 1: Question 1:
In a simple linear regression model the predicted straight line In a simple linear regression model the predicted straight line was as follows: was as follows: Weight (Kg) = 3.5 – 1.32 (weekly hours of PA) Weight (Kg) = 3.5 – 1.32 (weekly hours of PA) R2= 0.22; p-value for the slope= 0.04 R2= 0.22; p-value for the slope= 0.04
Is the association between weight & weekly hours of PA positive What is the magnitude of this association? or negative? 1.32 => One hour increase of PA in a week decreases Negative weight by 1.32 Kg.
In-class questions In-class questions
Question 1: Question 2:
Model Summary
In a simple linear regression model the predicted straight line Adjusted Std. Error of was as follows: Model R R Square R Square the Estimate a Weight (Kg) = 3.5 – 1.32 (weekly hours of PA) 1 .407 .166 .164 10.396 R2= 0.22; p-value for the slope= 0.04 a. Predictors: (Constant), ISS - injury severity measure
Coefficientsa Is the association significant at a level of 0.05? Unstandardized Standardized Coefficients Coefficients Model B Std. Error Beta t Sig. Because the p-value of the B1 is < 0.05; then reject H0 and 1 (Constant) .443 .747 .593 .554 conclude that weekly hours of PA provide significant ISS - injury .661 .066 .407 9.945 .000 information for predicting weight. severity measure a. Dependent Variable: Length of hospital stay In-class questions In-class questions
Question 2: Question 2: Coefficientsa Coefficientsa Model Summary Model Summary UnstandardizedStandardized UnstandardizedStandardized Adjusted Std. Error of Coefficients Coefficients Adjusted Std. Error of Coefficients Coefficients Model R R Square R Squarethe Estimate Mode B Std. Error Beta t Sig. Model R R Square R Squarethe Estimate Mode B Std. Error Beta t Sig. 1 (Constant) .443 .747 .593 .554 1 (Constant) .443 .747 .593 .554 1 .407a .166 .164 10.396 1 .407a .166 .164 10.396 ISS - injury ISS - injury .661 .066 .407 9.945 .000 .661 .066 .407 9.945 .000 a. Predictors: (Constant), ISS - injury severity m severity meas a. Predictors: (Constant), ISS - injury severity m severity meas a.Dependent Variable: Length of hospital stay a.Dependent Variable: Length of hospital stay
What is the dependent/ independent variable? Interpret the value of R2 Dependent variable: Length of hospital stay ISS explains 40.7% of the variation observed in length of Independent Variable: ISS- Injury severity score hospital stay.
In-class questions In-class questions
Question 2: Question 2: Coefficientsa Coefficientsa Model Summary Model Summary UnstandardizedStandardized UnstandardizedStandardized Adjusted Std. Error of Coefficients Coefficients Adjusted Std. Error of Coefficients Coefficients Model R R Square R Squarethe Estimate Mode B Std. Error Beta t Sig. Model R R Square R Squarethe Estimate Mode B Std. Error Beta t Sig. 1 (Constant) .443 .747 .593 .554 1 (Constant) .443 .747 .593 .554 1 .407a .166 .164 10.396 1 .407a .166 .164 10.396 ISS - injury ISS - injury .661 .066 .407 9.945 .000 .661 .066 .407 9.945 .000 a. Predictors: (Constant), ISS - injury severity m severity meas a. Predictors: (Constant), ISS - injury severity m severity meas a.Dependent Variable: Length of hospital stay a.Dependent Variable: Length of hospital stay
What is the null hypothesis? Alternative? Is there a significant association between the dependent & the
H0: BISS=0 independent? H : B ≠0 a ISS Because the p-value of the BISS is < 0.05; then reject H0 and conclude that ISS provide significant information for predicting length of hospital stay.
In-class questions Biases
Question 2: Coefficientsa Model Summary UnstandardizedStandardized Adjusted Std. Error of Coefficients Coefficients Model R R Square R Squarethe Estimate Mode B Std. Error Beta t Sig. 1 (Constant) .443 .747 .593 .554 1 .407a .166 .164 10.396 ISS - injury .661 .066 .407 9.945 .000 a. Predictors: (Constant), ISS - injury severity m severity meas Selection Information Confounding a.Dependent Variable: Length of hospital stay bias bias bias What is the magnitude of this association? 0.661 => Increasing ISS by 1 unit increases length of hospital stay by 0.661 days. Bias is an error in an epidemiologic study that results in an incorrect estimation of the association between exposure and outcome. Biases Confounding Bias: Definition
Is present when the association between an exposure and an outcome is distorted by an extraneous third variable (referred to a Selection Information Confounding confounding variable). bias bias bias
Confounding Bias: Example Confounding Bias: Minimize bias
Example : Study the association between coffee Research Design: drinking and lung cancer Use of randomized clinical trial LC Restriction Yes No Data Analysis: Coffee Coffee Yes 80 15 Multivariate statistical techniques No 20 85
OR= (80x 85)/ (15 x 20)= 22
What would you conclude????
Bivariate Analysis Multivariate analyses
Variable 1 2 LEVELS >2 LEVELS CONTINUOUS Logistic Regression Multiple Linear Regression Variable 2 (If outcome is 2 levels) (If outcome is continuous) 2 LEVELS X2 X2 t-test chi square test chi square test
>2 LEVELS X2 X2 ANOVA chi square test chi square test (F-test) Multivariate Analysis is used for adjusting for CONTINUOUS T-test ANOVA -Correlation confounding variables. (F-test) -Simple linear Regression Multivariate Analysis Multivariate analyses
WHY? Logistic Regression Multiple Linear Regression To investigate the effect of more than one (If outcome is 2 levels) (If outcome is continuous) independent variable.
Predict the outcome using various independent variables.
Adjust for confounding variables