What Is Partial Correlation?

Learn About Partial Correlation in SPSS With Data From Fisher’s Iris Dataset (1936) © 2019 SAGE Publications, Ltd. All Rights Reserved. This PDF has been generated from SAGE Research Methods Datasets. SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 Learn About Partial Correlation in SPSS With Data From Fisher’s Iris Dataset (1936) Student Guide Introduction This example dataset introduces the partial correlation statistic (also called rXY.Z or partial r). Partial r allows researchers to quantify the linear association between two quantitative variables removing the effects of one or more other variables, often described as the correlation between X and Y holding Z constant. Like the Pearson’s correlation coefficient, partial r ranges between −1 and 1, with more extreme values implying greater association. It is frequently used by researchers who want a quantification of the association between two variables when one or more other variables are presumed to be confounders, that is, X and Y are both a function of Z. This example describes partial correlation, discusses the assumptions underlying it, and shows how to compute and interpret it. We illustrate partial correlation using Fisher’s Iris dataset (1936). Specifically, we quantify the partial linear association between two flower properties (sepal length and petal length), controlling for two other flower properties (sepal width and petal width). We also test a hypothesis that the partial correlation is zero. This page provides a link to this sample dataset and a guide to producing the partial correlation coefficient and testing a hypothesis that the population value is 0 using statistical software. What Is Partial Correlation? Page 2 of 9 Learn About Partial Correlation in SPSS With Data From Fisher’s Iris Dataset (1936) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 This example introduces readers to the partial correlation statistic, typically denoted rXY.Z or partial r, where r is an estimate of a population correlation value, typically denoted ρ. This statistic quantifies the linear association between two variables (X and Y) conditioning on (also called partialling out, removing the effects of, holding constant) one or more other variables (Z). The statistic allows researchers to quantify the linear association between two variables with the effects of specified confounding variables removed. It is frequently used in conjunction with a theoretical model, for example, when the two variables of interest are each partially a function of another variable or set of variables but are also hypothesized to be related to each other. A similar concept is semipartial (or part) correlation which will also be discussed. As the sampling distribution of partial r and semipartial r are known under certain assumptions, a hypothesis test of r = 0 or some other constant is available under those assumptions. Before introducing the partial correlation coefficient, it is important to understand what is being quantified, under what typical assumptions, and, if applicable, what specific hypothesis is being tested. The null hypothesis associated with the partial correlation coefficient is ρXY.Z = k, the correlation between X and Y conditioned Z is constant k, and the most common value for k is 0, implying no partial linear association. The p-value for the null hypothesis ρXY.Z = 0 is the same as the p-value for β1 = 0 in the regression model y = β'Z +β1x+ε. For k ≠ 0, an approximate Z test is available. For example, suppose we are interested in the association between age and spending among adults aged 18–40, we know that income increases with age as does spending. We may use partial r to quantify the association between age and spending, adjusting for income, and test a hypothesis that there is no partial association, ρage, spending.income = 0. Partial r is also used with mediation models, to quantify direct effects. Using the previous example, we may say that income increases with age and spending increases with income, so there is an Page 3 of 9 Learn About Partial Correlation in SPSS With Data From Fisher’s Iris Dataset (1936) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 indirect effect of age on spending. Removing the effect of income, we can obtain the direct effect of age on spending and quantify the association using partial r. Calculations Partial Correlation Partial correlation is the Pearson product–moment correlation between the residuals of two regression models: Model 1: y = α1 + β1Z + ε1 Model 2: x = α2 + β2Z + ε2 Partial ρXY.Z = corr(ε1,ε2), the correlation between the error terms. As β1, β2, ε1 and ε2 are unknown, we use the sample estimates from a ordinary least squares ^ ^ regression model and obtain rXY.Z = corr(ε 1, ε 2). Statistical software packages use a more efficient algorithm which nets an identical result. Another conception of partial correlation is through a single linear regression model: y = βZ + βx + ε And the decomposition of the variance of y: σ2 = σ2 + σ2 + σ2 y βZ β(x ⊥ Z) ε where x ⊥ Z is the part of x that is orthogonal to (uncorrelated with) Z σβ(x ⊥ Z) partial ρXY.Z = 2 2 σ + σ √ β(x ⊥ Z) ε We use the model-based estimates of the variance components and obtain Page 4 of 9 Learn About Partial Correlation in SPSS With Data From Fisher’s Iris Dataset (1936) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 SS(x ⊥ Z) partial rXY.Z = ^ SS(x ⊥ Z)+SS ε √ ( ) where SS is the sum-of-squares. The partial correlation value and the associated hypothesis test are not affected by which variable is denoted x and which is denoted y. Semipartial Correlation Consider Model 2 as above: Model 2: x = α + β2Z + ε2 ^ Semipartial ρXY.Z = corr(y,ε2) and semipartial rXY.Z = corr, (y, ε 2). As a function of variance components from the regression model: y = βZ + βx + ε σ2 β(x ⊥ Z) semipartial ρYX.Z = 2 σ √ y and SS(x ⊥ Z) semipartial r = YX.Z √ SS(y) The value of the semipartial correlation coefficient is affected by the assignment of the variables to x and y. When the partial or semipartial correlations are estimates using SS, the value ^ 2 2 2 takes the sign of the corresponding β. As σ ≥ σ + σ , partial correlation y β(x ⊥ Z) ε is always equal to or larger in absolute value than semipartial correlation. Ordered Semipartial Correlation Page 5 of 9 Learn About Partial Correlation in SPSS With Data From Fisher’s Iris Dataset (1936) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 Consider the linear regression model: y = a + β1x1 + β2x2 + … + βpxp + ε and 2 2 2 2 2 σ = σ + σ * * +…+σ * * + σ y β1x1 ε β2x2 βpxp * Where xi is the portion of xi orthogonal to all xj, j < i. Ordered semipartial correlations are then obtained: σ2 β*x* i i ordered semipartial correlation ρyx = 2 i σ √ y Estimates of the ordered semipartial correlations are obtained using Type I sum- of-squares SS1: SS1(xi) ordered semipartial correlation r = yxi √ SS(Total) * As SS1 are computed sequentially, using xi or xi for modeling yields the same result. The squared ordered semipartial correlations sum to the model R2, so are used to decompose the total variance of y into components due to each x variable, adjusting for all previous x variables. The order in which the variables are entered into the model is important and should be considered carefully. Requirements and Assumptions Hypothesis tests have assumptions. If the assumptions are not met, methods may still be applied, but error rates may be compromised. Understanding the assumptions will improve your research design and data collection efforts. There are two ways of approaching partial r with two sets of assumptions: 1. Two regression models x = β1Z + ε1 and y = β2Z+ε2 Page 6 of 9 Learn About Partial Correlation in SPSS With Data From Fisher’s Iris Dataset (1936) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd. All Rights Reserved. 2 • Both models are correctly specified. • ε1 are independent of each other. • ε2 are independent of each other. • The values are obtained so as to manifest properties of simple random sampling. 2 2 2 2 • ε1 ~ N(0,σ1) and ε2 ~ N(0,σ2), and σ1 and σ2 are both constant. 2. The regression model y = βZ + βx + ε • The model is correctly specified. • All ε are independent. • The values are obtained so as to manifest properties of simple random sampling. • ε ~ N(0,σ2), σ2 is constant. These are the usual assumptions for hypothesis testing within the context of the linear regression model. The set of assumptions chosen are a theoretical consideration only; the value of partial r and the associated p-value are the same under both approaches. Illustrative Example: Partial Correlations Among Leaf Properties in Iris This example presents the use of partial correlation between two leaf properties (sepal length and petal length), controlling for two other leaf properties (sepal width and petal width), all obviously related to size. This is relevant as researchers may be interested in relations between biological properties controlling for other biological properties. In this example, we propose that the association between sepal length and petal length is a function only of overall size, which also manifests in sepal width and petal width, so the adjusted correlation should be low. Partial correlation, and the corresponding hypothesis, will help to assess the tenability of this argument. Page 7 of 9 Learn About Partial Correlation in SPSS With Data From Fisher’s Iris Dataset (1936) SAGE SAGE Research Methods Datasets Part 2019 SAGE Publications, Ltd.

Load more