Why Overfitting Is Not (Usually) a Problem in Partial Correlation

PsyArXiv Why Overfitting is Not (Usually) a Problem in Partial Correlation Networks Donald R. Williams & Josue E. Rodriguez University of California, Davis Network psychometrics is undergoing a time of methodological reflection. In part, this was spurred by the revelation that `1-regularization does not reduce spurious associations in partial correlation networks. In this work, we address another motivation for the widespread use of regularized estimation: the thought that it is needed to mitigate overfitting. We first clarify important aspects of overfitting and the bias-variance tradeoff that are especially relevant for the network literature, where the number of nodes or items in a psychometric scale are not large compared to the number of observations (i.e., a low p=n ratio). This revealed that bias and especially variance are most problematic in p=n ratios rarely encountered. We then introduce a nonregularized method, based on classical hypothesis testing, that fulfills two desiderata: (1) reducing or controlling the false positives rate and (2) quelling concerns of overfitting by providing accurate predictions. These were the primary motivations for initially adopting the graphical lasso (glasso). In several simulation studies, our nonregularized method provided more than competitive predictive performance, and, in many cases, outperformed glasso. It appears to be nonregularized, as opposed to regularized estimation, that best satisfies these desiderata. We then provide insights into using our methodology. Here we discuss the multiple comparisons problem in relation to prediction: stringent alpha levels, resulting in a sparse network, can deteriorate predictive accuracy. We end by emphasizing key advantages of our approach that make it ideal for both inference and prediction in network analysis. Keywords: partial correlation network, overfitting, prediction, frequentist inference, mean squared error In the social-behavioral sciences, network theory has hypersexuality (Werner, Štulhofer, Waldorp, & Jurin, 2018). emerged as an increasingly popular framework for under- Recently, the foundation of network psychometrics was standing psychological constructs (Borsboom, 2017; Jones, improved upon when the default methodology was revis- Heeren, & McNally, 2017). The underlying rationale is that ited (Williams & Rast, 2019; Williams, Rhemtulla, Wysocki, a group of observed variables, say, self-reported symptoms, & Rast, 2019). In the network literature, `1-regularization are a dynamic system that mutually influence and interact (a.k.a., “least absolute shrinkage and selection operator” or with one another (Borsboom & Cramer, 2013). The observed “lasso”) emerged as the default approach for detecting con- variables are “nodes” and the featured connections between ditionally dependent relations. Initially, it was motivated by nodes are “edges.” This work focuses on partial correlation the thought that it reduces spurious relations. Paradoxically, networks, wherein the edges represent conditionally depen- the exact opposite holds true: lasso is known to not select dent nodes—pairwise relations that have controlled for the the correct model (Zhao & Yu, 2006) and to have a relatively other nodes in the network (Epskamp, Waldorp, Mottus, & low false negative rate. For the latter, in structural equations Borsboom, 2018). This powerful approach has resulted in models (SEM), it was noted that “lasso kept more variables an explosion of research; for example, network analysis has in the model (more Type I and fewer Type II errors)” (p. 72, been used to shed new light upon a variety of constructs Jacobucci, Brandmaier, & Kievit, 2019). In network models, including personality (Costantini et al., 2015), narcissism it was recently demonstrated that the inflated false positive (Di Pierro, Costantini, Benzi, Madeddu, & Preti, 2019), and rate inherent to lasso depends on many factors, including the sample size, edge size, sparsity, and the number of nodes (see Figure 6 in Williams et al., 2019). On the other hand, DRW was supported by a National Science Foundation Grad- nonregularized estimation has a lower false positive rate that uate Research Fellowship under Grant No. 1650042. All does not depend on those factors (Williams & Rast, 2019). code to reproduce the simulations and figures is available online Together, it is now clear that limiting false positives does not (https://osf.io/fm92b/). motivate the default status of `1-regularization. In this work, we seek to further improve network anal- 1 2 DONALD R. WILLIAMS & JOSUE E. RODRIGUEZ ysis by revisiting another purported benefit of using `1- predictive performance compared to `1-regularization. How- regularization. Additional motivation for using lasso is the ever, in certain situations, this is a robust finding in the model thought that it can mitigate overfitting. The underlying ratio- selection literature (see references in Bertsimas, King, & nale is summarized in Fried and Cramer(2017): Mazumder, 2016; Mazumder, Radchenko, & Dedieu, 2017). In the case of network models, overfitting is an This is summarized in Hastie, Tibshirani, and Tibshirani especially severe challenge because we investi- (2017): gate relationships among a large number of vari- The lasso gives better accuracy results than ables, which means there is danger of overfitting [nonregularized] best subset selection in the low a large number of parameters. One way to miti- signal-to-noise ratio (SNR) range and worse ac- gate this problem somewhat is to regularize net- curacy than best subset in the high SNR range. works (p. 1011). The transition point—the SNR level past which Although Fried and Cramer(2017) also stated “it is unclear best subset outperforms the lasso—varies de- at present to what degree regularization techniques increase pending on the problem dimensions (p. 17). generalizability” (p. 1011), it has nonetheless permeated the network literature. Indeed, the emerging consensus is that The transition point refers to the number of variables (p) rel- ative to the number of observations (n). Hence, even in noisy `1-regularization provides “protection against overfitting” (p. 16, Christensen & Golino, 2019). We argue that such state- situations, the benefits of lasso diminish or vanish altogether ments are not entirely clear, as, for example, it is all but guar- when n is much larger than p. This is the customary situation in the network literature. The remaining consideration is the anteed that the “fit to the data” (measured by predictive ac- 2 SNR (i.e., R ). Nodes are usually items from a validated curacy) will be better for training data than unseen data— 1−R2 some degree of overfitting is a foregone conclusion. In our psychometric scale, that, by construction, should not have a opinion, it is also not readily apparent that overfitting is a terribly low SNR. Together, this suggests that the oft-touted “severe challenge” that motivates regularization compared to protective shield of regularization may be overstated. nonregularized estimation. Because regularization has seri- The thought that regularization is advantageous for limit- ous ramifications on statistical inference, such as presenting ing overfitting extends beyond the network literature. In the issues for computing valid confidence intervals (see for ex- social-behavioral sciences, this is typically presented with ample section 3.1 in Bühlmann, Kalisch, & Meier, 2014), some combination of regularization and the bias-variance the guiding idea behind this work is that the “challenge” of tradeoff (Jacobucci et al., 2019; Yarkoni & Westfall, 2017). overfitting must warrant sacrificing gold-standard statistical It is often argued that biasing estimates with regularization approaches (e.g., ordinary least squares and p-values). can be advantageous, for example, “[lasso] introduces bias The network literature provides a unique opportunity to in the parameter estimates in order to avoid overfitting”(p. investigate predictive methods in the social-behavioral sci- 6, Ryan, Bringmann, & Schuurman, 2019). However, it is ences. This is because nodewise predictability has an impor- important to note that parameter and prediction bias are not tant place in both network theory and analysis (Haslbeck & the same thing. Indeed, from the statistics literature, “the Fried, 2017; Haslbeck & Waldorp, 2018). The basic idea is nonzero estimates from the lasso tend to be biased toward to see how well a given node is predicted by the other nodes zero, so the debiasing [removing parameter bias]...can often in the network. The primary motivation of this approach is improve the prediction error of the model” (p. 16, Hastie, to determine “how self-determined the network is” (p. 860, Tibshirani, & Wainwright, 2015). In addition to propos- Haslbeck & Waldorp, 2018). This is accomplished by com- ing novel methodology, another major contribution of this puting variance explained for each node in the network, given work is clarifying possible confusion surrounding the bias- the corresponding “neighborhood” of conditional relations variance tradeoff and overfitting. (i.e., shared connections with other nodes, Meinshausen & Major Contribution Bühlmann, 2006). This approach is used extensively in the network literature, including as an outcome measure in clin- Our major contribution is explicitly examining overfitting ical interventions (see for example Blanken et al., 2019). in partial correlation networks. In the literature, the general Therefore, it is important to develop methodology that brings thought is that

Why Overfitting Is Not (Usually) a Problem in Partial Correlation

Harmless Overfitting: Using Denoising Autoencoders in Estimation of Distribution Algorithms

More Perceptron

Measuring Overfitting and Mispecification in Nonlinear Models

Over-Fitting in Model Selection with Gaussian Process Regression

On Overfitting and Asymptotic Bias in Batch Reinforcement Learning With

Overfitting Princeton University COS 495 Instructor: Yingyu Liang Review: Machine Learning Basics Math Formulation

A Robust Hybrid of Lasso and Ridge Regression

Lasso Reference Manual Release 17

Overfitting Can Be Harmless for Basis Pursuit, but Only to a Degree

Least Squares After Model Selection in High-Dimensional Sparse Models.” DOI:10.3150/11-BEJ410SUPP

Modern Regression 2: the Lasso

On Lasso Refitting Strategies