Statistical Suppression in Psychological Research 1
Total Page:16
File Type:pdf, Size:1020Kb
Statistical Suppression in Psychological Research 1 Incidence and Interpretation of Statistical Suppression in Psychological Research Naomi Martinez Gutierrez & Robert A. Cribbie Quantitative Methods Program Department of Psychology York University Statistical Suppression in Psychological Research 2 Abstract Suppressors are third variables that increase the predictive power of one or more predictors by suppressing their irrelevant variance when included in a regression model. Although theoretically and statistically useful, no research has addressed the frequency or interpretation of statistical suppression (SS) in the psychological literature. Two studies explored the nature and interpretation of SS. In the first study, regression analyses were reviewed to determine the frequency with which SS occurs in psychological articles published in 2017. Results indicate that approximately one-third of articles showed evidence of SS, although researchers did not acknowledge or attempt to interpret the SS. The second study reviewed articles containing the keyword ‘suppression’ to assess the interpretations provided by researchers that identified SS. Results indicate that most researchers do not attempt to classify or interpret SS. Therefore, although SS is common in psychology, scarcely any attempts are made to identify, classify, and/or interpret it. Statistical Suppression in Psychological Research 3 Incidence and Interpretation of Statistical Suppression in Psychological Research Suppressors are variables that remove irrelevant variance from other predictors included in a model, thereby increasing the predictive validity of the variable(s) which have had their irrelevant variance suppressed (Conger, 1974). In other words, suppressors unmask relationships between predictor(s) and outcomes, increasing each suppressed variable’s predictive power. Despite their usefulness, Pandey and Elliot (2010) note that statistical suppression (SS) remains misunderstood and underreported. However, there are no previous studies that have investigated the frequency of SS within psychology or, subsequently, its interpretation (or lack thereof). Thus, the purpose of this study is to report on the occurrence and interpretation of SS within psychological research. Given the paucity of research on SS, a better understanding of the frequency with which SS occurs and the nature of the interpretations of SS provided by researchers will help clarify the extent to which SS is an issue warranting further investigation. Quantifying Statistical Suppression There are a variety of statistical models used in the field of psychology (e.g., multiple regression, structural equation modeling, hierarchical linear modeling), with most being able to easily produce the statistical information necessary to identify the presence of SS. For example, imagine a researcher interested in the partial effects of predictors X1 and X2 on outcome y who adopts the model: yi = b0 + b1X1i + b2X2i + ei, where yi is the outcome value for individual i, b0 is the predicted value of yi when X1 and X2 are 0, b1 and b2 are the partial regression coefficients (slopes) for predicting y from X1 and X2, respectively, and ei is the portion of yi not explained by b0 + b1X1i + b2X2i. Statistical Suppression in Psychological Research 4 The important statistics for exploring SS include raw sample correlations (r, which are identical to standardized regression coefficients in single predictor models with standardized variables), standardized partial regression weights (β) and partial (pr)/semipartial (sr) correlations, all of which indicate the strength of the relationship between a predictor and an outcome. β refers to the partial regression coefficient from a model in which all variables have been transformed (standardized) to have a mean of 0 and standard deviation of 1, pr refers to the correlation between a predictor and a criterion when a third variable’s variance is removed from both the predictor and the criterion, and sr refers to the correlation between a predictor and a criterion when a third variable’s variance is removed from the predictor. pr2/sr2 provide an estimate of the proportion of variance in an outcome variable that can be explained by a predictor when other predictors are partialled out (e.g., sr2 represents the increase or decrease in the model R2 if a predictor is added or removed, respectively, from the model). When there are only two predictors and an outcome (y) in a model, the formula for sr can be written as: 푟푦1 − 푟푦2푟12 푟 = , (1) 푦(1.2) 2 √1 − 푟12 wherein ry(1.2) reflects the sr between the outcome y and predictor X1 when predictor X2 is controlled only from predictor X1. ry1 is the correlation between y and X1, ry2 is the correlation between y and X2, and r12 is the correlation between X1 and X2. Similarly, the formula for pr is: 푟푦1 − 푟푦2푟12 푟(푦1).2 = . (2) 2 2 √1 − 푟12√1 − 푟푦2 Statistical Suppression in Psychological Research 5 The standardized coefficient β is defined as the standard deviation change expected in the outcome per standard deviation change in the respective predictor. In a model with two predictors and an outcome, the formula for β is: 푟푦1 − 푟푦2푟12 훽푦1.2 = 2 (3) 1 − 푟12 wherein βy1.2 is the partial regression coefficient for X1, partialling out the effects of X2. Equations 1 and 3 are related in that they share the same numerator. The denominators, however, differ such that (combining Equations 1 and 3): 푟푦(1.2) 훽 = . (4) 푦1.2 2 √1 − 푟12 Equations 2 and 3 are also similar but differ in denominators in that (combining Equations 2 and 3): 2 √1 − 푟푦2 푟(푦1).2 훽 = . (5) 푦1.2 2 √1 − 푟12 In a SS effect, pr, sr or 훽푦1.2 will either be larger or different in sign relative to the respective raw correlation or regression coefficient (ry1/훽푦1,Velicer, 1968). In a two-predictor regression model, this occurs because one predictor is suppressing or explaining irrelevant variance within the other predictor, therefore making the suppressed predictor’s relationship with the outcome stronger. This can be further explained by using models where all variables have been standardized. Both equations 6 and 7 below represent simple linear regression models. Equation 8, on the other hand, is the multiple regression model that includes both of the Statistical Suppression in Psychological Research 6 predictors from equation 6 and 7. Here we use X1 and X2 to represent the standardized predictor variables and y to represent the standardized outcome variable. 푌푖 = 훽0 + 훽1푋1푖 + 휀푖 (6) 푌푖 = 훽0 + 훽2푋2푖 + 휀푖 (7) 푌푖 = 훽0 + 훽3푋1푖 + 훽4푋2푖 + 휀푖 (8) Statistical suppression occurs when |β3| is greater than |β1| and/or |β4| is greater than |β2| (or the signs of the coefficients are reversed, e.g.. β3 = .2, β1 = -.3). From the previous discussion, the partial β coefficients could also be converted to pr or sr, wherein |푟(푦1).2| or |푟푦(1.2)| would be compared to |푟푦1|. In summary, derivatives of multiple regression models consist of similar statistics (e.g., β, sr, pr) that allow researchers to determine the allocation of variance amongst the variables involved. When the β, pr and/or sr associated with a particular predictor is greater in magnitude or opposite in sign relative to its raw correlation, this is called a SS effect. There are, however, different subtypes accompanying this effect; these will be discussed in the following section. Types of Statistical Suppression Absolute Suppression. Absolute SS was first explained by Horst et al. (1941). Using a two predictor model as an example, absolute SS occurs when one predictor suppresses irrelevant variance in the other and thus the magnitude of the β/sr/pr associated with the suppressed predictor becomes larger than its raw correlation with the outcome (e.g., β3= -.4, β1 = -.2, β4 = .2, β2 =.3). In other words, β3 is greater in magnitude relative to β1, but retains the same sign. In this situation, X2 acts as the suppressor. A more restrictive version of absolute SS is classical suppression, wherein the suppressor does not correlate with the outcome. Statistical Suppression in Psychological Research 7 A valuable example of absolute SS was provided by Horst (1966). During World War II, the success of pilot training, the outcome, was predicted by paper-delivered tests measuring mechanical, numerical, and spatial ability. Although verbal ability did not predict pilot training, it did correlate with the three abilities being measured, since verbal skills were needed to read and comprehend the tests in the first place. The relationship between the other abilities and the success of pilot training was strengthened when verbal skills was included in the model (i.e., controlled for). Here, verbal ability is an absolute suppressor. Accordingly, it is useful to include it in the regression model so that it can suppress the irrelevant variance in the other predictors related to test taking ability, therefore enhancing their predictive power. Following Horst et al. (1941), negative SS was discussed by Lubin (1957). Continuing with the two predictor model example, negative SS occurs when the addition of a suppressor variable into a model causes the sign of the other predictor’s β/sr/pr to be reversed. For example, negative SS would be visible if the sign of β3 was opposite to that of β1 (e.g., β3 = -.2, β1= .2). In this scenario, X2 acts as the suppressor. An example of negative SS comes from a study on sleep deprivation by Tagler, Stanko and Forbey (2017). The raw correlation between perceived behavioural control and actigraphy sleep duration was positive (r = .18), however the standardized partial coefficient was negative (β = -0.07). Given that only two predictors were included in the multiple regression analysis, it is possible to pinpoint the second predictor (intention) as the negative suppressor. Lastly, mutual SS was introduced by Conger (1974). In this situation, both predictors in the two-predictor model obtain a larger β/sr/pr in in magnitude relative to their respective raw correlations with the outcome.