It is better an approximate answer to the right question than the exact answer to the wrong question1: the case of the psychometric analysis of the ASQ:SE

Abstract: One of the main activities in is to analyze the internal structure of a test. Multivariate statistical methods, including Exploratory (EFA) and Principal Component Analysis (PCA) are frequently used to do this, but the growth of Network Analysis (NA) places this method as a promising candidate. The results obtained by these methods are of valuable interest, as they not only produce evidence to explore if the test is measuring its intended construct, but also to deal with the substantive theory that motivated the test development. However, these different statistical methods come up with different answers, providing the basis for different analytical and theoretical strategies when one needs to choose a solution. In this study, we took advantage of a large volume of published data (n = 22,331) obtained by the Ages and Stages Social-Emotional (ASQ:SE), and formed a subset of 500 children to present and discuss alternative psychometric solutions to its internal structure, and also to its subjacent theory. The analyses were based on a polychoric matrix, the number of factors to retain followed several well-known rules of thumb, and a wide of exploratory methods was fitted to the data, including EFA, PCA, and NA. The statistical outcomes were divergent, varying from 1 to 6 domains, allowing a flexible interpretation of the results. We argue that the use of statistical methods in the absence of a well-grounded psychological theory has limited applications, despite its appeal. All data and codes are available at https://osf.io/z6gwv/ .

Keywords: Psychometrics, Multivariate analysis, Ages and Stages Questionnaire, Internal Structure.

Introduction Psychology and measurement are intrinsically related, which partially explains the fundamental role that and psychometrics play in this field (Stigler, 1992; Wright, 2009). Despite their close relationship, their role may differ with regard to psychological testing. While psychometrics is more concerned with the development of psychological instruments, statistics comes into the scene to provide the which psychometricians rely on to check certain properties of a new tool during the data analysis process (Furr & Bacharach, 2014). Although these two areas are the core elements of the scientific method in Psychology, and are those which psychologists have profusely contributed to for measurement theory, these two disciplines are rooted in independent traditions which may result in disagreement regarding some analysis (Box, 1976). This is particularly true when psychometricians want to analyze the data obtained through a psychological test, but especially when the analysis of the internal structure of a measurement tool is needed (Golino & Epskamp, 2017; Preacher & MacCallum, 2003).

1 This title is inspired from John Tukey’s famous quote “Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise” reported in his own work entitled “The future of data analysis” published in the Annals of Mathematical Statistics, on page 13 (1962). However, Tukey himself quoted this sentence, which leaves us with uncertainty if he just borrowed this phrase, rather than being its author.

1 The fact that statistics is a formal science must not be forgotten. Thus, in viewing this process through a statistical lens, there is an entire list of assumptions which one needs to follow in order to achieve results shielded from criticism. These would include the preliminary adequacy tests, the measurement level of the observed variables, and the correlation method which should be used—in sum all of the statistical analyses required to validate the test. In turn, psychometrics does not operate in a vacuum, but is largely an applied part of psychological science. When psychometricians start analyzing the internal structure of an instrument, one of their main goals is to make bridges in order to enable interpreting the results from the theoretical perspectives about the cognitive or affective process assumed to have driven the observed responses. These statistical assumptions very often do not properly fit psychological data. Consequently, statisticians and psychometricians might get involved in an eternal tug of war regarding how to conduct the analytical process. From the psychometrics perspective, the violation of some of the statistical assumptions is insignificant if its consequence can unveil or clarify the relationship between the data obtained by a psychological test and the basilar theory used for its development. In the exact opposite direction (not always, but often), statisticians debate an amount of evidence pointing out that lenient decisions are cases of the researcher’s degrees of freedom, and might be prone to false positives, and are not a secure shelter for the replicability crisis (Simmons et al., 2011; Wigboldus & Dotsch, 2016).

This debate is still ongoing with no end in sight, and many notable debaters are involved. Nevertheless, another important face of this relationship emerges when exploratory data analysis offers a wide plethora of possibilities, even when adequate statistical analyses are used in psychological data. The decision herein is not a matter of “if” or “how much” violation is acceptable. Decisions can be moot. As the known statistical assumptions were met, the decision-making process can be almost entirely transformed into a matter of convenience in which the analyst can first write the aimed conclusion, and then do some mathematical gymnastics in order to lend an air of scientific discovery. From the statistical angle, methods to analyze the internal structure of a psychological test are based on /analysis (MA). MA analysis refers to all statistical techniques that simultaneously analyze multiple measurements on individuals or objects under investigation, in which a set of efficient tools for analyzing complex correlation patterns is disposable (Hair et al., 2014). Exploratory Factor Analysis (EFA) and Principal Component Analysis (PCA) are often seen as traditional (data reduction) methods used in this regard. Somehow, some consider that factor analysis may be considered to be a generalization of PCA (Mukherjee et al., 2018). Network (psychometrics) analysis (NA) has also recently emerged as a qualified candidate for this procoedure (Schmittmann et al., 2013). Although it is a partial agreement for which PCA should technically not be used to deal with psychological data due to its similarity to EFA and less computational requirement, this method is still present in the literature (Kim, 2008; Norris & Lecavalier, 2010) plays a pivotal role in MA. Within the EFA framework, all models assume that the variability can be partitioned into common and unique elements. As a result, the unique variance is broken down into specific and error. Common variance is used in the analysis of EFA as its statistical objective is to reduce a large number of variables into fewer numbers of factors or, in other words, to identify number and nature of the factors which can be used to explain the shared variability in a set of observed indicators (Bandalos & Finney, 2018; Preacher et al., 2013; Woods & Edwards, 2007). This is theoretically the concept of commonality, which represents the proportion of the observed variance due to common factors,

2 or the total amount of variance for an item explained by the extracted factors (Macciotta et al., 2012). In another direction, PCA is a pure data reduction method only, which assumes that all variance is common or shared, with no division. As summed up by Maxwell, there is no need for an implicit hypothesis of the structure of covariance of the variables (Franco & Marradi, 2013). Therefore, PCA models do not take into account the unique variance, but rather assume that the total variance is equal to common variance. Its statistical procedure operates through a linear combination of the observed variables, thereby aiming at creating components which summarize the original data while preserving as much information as possible. Conversely to EFA, PCA aims to maximize the variance instead of reproduce correlations as is done in EFA (Bandalos & Finney, 2018; Norris & Lecavalier, 2010).

In recent years the models developed within network analysis have been proposed as an alternative way of looking at psychometric problems. The mechanics behind NA are based on the mutually reinforcing variables in contrast to a latent causal relationship. This has rapidly grown (Contreras et al., 2019) and nurtures with evidence those which assume this analysis provides “an entirely different conceptualization of why psychometric variables cluster in the way that they do” (Epskamp et al., 2017). However, some debates are still open, probably due to its the recent nature of these debates. For instance, the statistical equivalence of models obtained via structural equation modeling (SEM) and Item Response Theory (IRT) collide with the innovation proposal in terms of analysis, and some replicability issues introduce a dose of skepticism of the results obtained via NA (Forbes et al., 2017; van Bork et al., 2019).

In view of the above, in this study we have utilized a dataset composed of a large number of children assessed by the Ages & Stages : Social-Emotional (ASQ:SE) (Anunciação, Squires, et al., 2019) to present and discuss several alternative results to its internal structure. To a lesser extent, this manuscript confronts the largely disseminated idea that exploratory factor analysis is an entirely data-driven approach, and raises questions regarding the relationship between theories and analytical approaches in psychometrics.

Method The current work is part of a broader study measuring the psychometric properties of the Brazilian version of the Ages and Stages Questionnaire System. The project occurred between 2010 to 2012 in Rio de Janeiro, with the implementation of a large‐ scale developmental assessment of children from 6 months to 5 years of age. All data used in this study are available online at https://osf.io/z6gwv/.

Participants The original sample consisted of 22,331 60-month-old children who were enrolled in all 468 public daycare centers and preschools in the city of Rio de Janeiro, Brazil, in 2011. Demographic details and ASQ:SE item-level scores are presented elsewhere (Anunciação, Squires, et al., 2019). A total of 500 children were randomly selected to perform the analyses. This subsample was formed of 276 males (55%) and 224 (45%) females. The score obtained by these participants was 41 (SD = 36.4), below the cut-off score for developmental delays.

3 Ages and Stages Social-Emotional Questionnaire (ASQ:SE) The Ages and Stages Social-Emotional Questionnaire (ASQ:SE) was developed to be a low-cost and psychometrically-sound screening tool which can accurately reflect the emotional and social competence of infants, toddlers, and preschool-age children (Squires et al., 2001). Seven behavioral areas within the social and emotional skills were intended to be measured by the ASQ:SE items: self-regulation, compliance, communication, adaptative learning, autonomy, affect, and with people. The content of all items could be related to competence or problem behaviors, and includes questionnaires based on the child’s age, including: 6, 12, 18, 24, 30, 36, 48, and 60 months. It was originally designed to be completed by parents, but recent studies have relied on data gathered from teachers and another caregiver. Despite these conceptual seven domains, an increasingly growing amount of evidence, mainly using the factor analysis method, is being formed supporting that the data obtained through the ASQ:SE is formed by a two-dimensional structure which reflect its names (social and emotional) (Anunciação, Chen, et al., 2019; Chen et al., 2016). Moreover, the instrument’s scoring is done by reversing some items and then summing up the responses. The Cronbach’s alpha coefficient in this study was 0.86 [95% CI, 0.85-0.88]

Analysis All analyses were conducted within a random subset of the main dataset in which 500 participants were included. No missing cases, inconsistencies or outliers were found, and the responses to all items were coded with 1, 2, and 3. A polychoric correlation matrix was computed due to of its ordinal level, and used as input in further psychometric analyses (Garrido et al., 2013).

There are a number of statistical recommendations in EFA and PCA to guide the number of factors or components to retain. Several suggestions were taken into account because this study is based on an exploratory endeavor. First, the scree-plot and the parallel analysis (PA) were employed. In addition, the Hull method for selecting the number of common factors was computed. The first two methods are graphically similar, as they rely on a plot with the eigenvalues (variance explained) presented in descending order. However, while the results of the scree-plot are dependent on the current data, the machinery underlying the PA works on generating a simulated dataset with N observations randomly sampled from the variables, iteratively extracting the eigenvalues, ordering them from largest to smallest, and computing summary statistics of these simulated results to compare with the first results (Braeken & van Assen, 2017).

Using a different strategy, the Hull method is based on a numerical convex hull-based heuristic and can be regarded as a generalization of the scree-test, in which a plot is built with the goodness-of-fit (GoF) measure versus the degrees of freedom (df). These heuristics aim at identifying the best balance between goodness-of-fit and the df in an iterative fashion, as follows: 1) the range of factors to be considered is determined; 2) the goodness-of-fit of a series of factor solutions is assessed; 3) the degrees of freedom of the series of factor solutions is computed; and 4) the elbow is located in the higher boundary of the convex hull of the hull plot (Lorenzo-Seva et al., 2011).

The recommendations obtained through each method were performed within the polychoric matrix solution. Although the number of factors varied, the weighted least squares (WLS) solution was defined as the default factor estimation (Baghdarnia et al., 2014; Forero et

4 al., 2009) with the Promax oblique rotation to the solutions obtained after analyzing the scree- plot and the PA recommendations. Oblique rotations enable factors to be correlated, which is in line with underlying phenomena. In contrast, the Robust Unweighted Least Squares was used with the Hull method as its more often suggested (Yang-Wallentin et al., 2010) Another set of analysis was performed via NA, especially by the Exploratory Graph Analysis (EGA). This latter analysis relied on the Gaussian (GGM) as its estimation method, and the graphical least absolute shrinkage and selection operator as detection algorithm. The results from this analysis were also plotted into an easy-to-read network graph, wherein items in each dimension are color-coded and edges are the between two nodes given all other nodes in the network (Golino & Epskamp, 2017).

The psychometric analyses were carried out in R 4 (R Development Core Team, 2020), with tidyverse, psych, and EGAnet, and Factor 10.10 (Lorenzo-Seva & Ferrando, 2006).

Results One way to determine the number of factors or components in a data frame or a correlation matrix is to examine the scree plot of the successive eigenvalues. In this plot, the factor number is plotted on the x-axis and includes all items of the measure. As the ASQ:SE 60- months formed of 32 items, the horizontal axis goes up to 32. The eigenvalue is plotted in the y- axis and represents the proportion of variance explained by each factor or component. Multiple recommendations pointing out the ideal number of factors to retain exist. The elbow rule is probably the simplest and suggests that sharp breaks in the plot could be used to determine the appropriate number of factors or components to extract. Despite its subjectivity, this rule may conclude in favor of two-factors. In addition, this scree plot can also inform the results of the PA. This method compares the scree plot of factors of the observed data with that of a random data matrix of the same size as the original. Its use is widely recommended, but recent studies have suggested changing some of its decision rules (Braeken & van Assen, 2017). The PA herein suggested the retention of 3 factors. Figure 1 displays these results.

5 Figure 1. Scree plot with the EFA and PCA results. Another recommendation is based on the Kaiser criterion, also known as the eigenvalue-greater-than-one rule. This result suggested the retention of 3 factors, which are those above the dashed line in the figure. Conversely, PCA is a variable reduction procedure rather than a recommended statistical analysis within psychometrics. However, its use is still frequent and also a common feature available in commercial software. The PCA suggestion was based on the PA method and recommended the retention of 2 components. A relatively recent new method of factor retention is the Hull method. At the heart of this method is an optimal balance of fit and number of parameters (Lorenzo-Seva et al., 2011), which suggested retaining one single factor. This unidimensional solution achieved a GoF of 0.936, with a df of 464, and is described in Table 1. Table 1 - Hull Method results.

Factors GoF df Scree test value 0 0 496 0 1 0.936 464 17.248* 2 0.989 433 6.353 3 0.997 403 3.779 4 0.999 374 1.726 5 1 346 0 6 1 319

Each of these recommendations was fitted to data and Table 2 reports some of the results. The loadings (either factor or component) represent the pattern of relationships between the common factors and the indicators. In its turn, the cumulative proportion is an index of how much of the variance can be accounted for by the factors or components extracted and is based on the average of the commonality.

6 In a multidimensional solution of factor analysis, the items tend have loadings in all available factors. Some authors argue that loadings should be at least 0.32 to be interpretable (Tabachnick & Fidell, 2001), and the factor in which the item has its highest loading should be considered the best candidate in future analyses (Costello & Osborne, 2011), which is presented in the last three columns of the next table. Table 2 - Factor Solutions in the exploratory factor analyses.

Parallel Analysis Main factor Hull Elbow (PA) PCA El- ASQ:SE item f1 f1 f2 f1 F2 f3 f1 f2 PA PCA bow 0.30 q1-look_at 9 0.180 0.600 0.570 0.180 0.080 0.28 0.67 f2 f1 f2 0.60 - - - - q2-cling 3 0.570 0.230 0.110 0.010 0.610 0.55 0.12 f1 f3 f1 0.56 - - - q3-be_hugged 4 0.190 0.680 0.540 0.250 0.380 0.07 0.65 f2 f1 f2 0.55 - - - - q4-play_adults 7 0.180 0.620 0.600 0.070 0.080 0.09 0.62 f2 f1 f2 0.30 q5-calm_down 6 0.550 0.190 0.200 0.270 0.370 0.6 0.31 f1 f3 f1 q6- friendly_strang 0.60 - - - ers 5 0.610 0.340 0.260 0.120 0.520 0.55 0.21 f1 f3 f1 0.24 q7-settle_down 4 0.630 0.160 0.040 0.650 0.120 0.69 0.3 f1 f2 f1 0.34 - - q8-seem_happy 3 0.030 0.700 0.710 0.080 0.100 0.08 0.74 f2 f1 f2 0.23 q9-tantrums 8 0.680 0.090 0.160 0.180 0.580 0.73 0.24 f1 f3 f1 0.24 - - q10-intereset 5 0.060 0.770 0.780 0.050 0.050 0.05 0.83 f2 f1 f2 0.04 - q11-bathroom 3 0.300 0.260 0.420 0.240 0.560 0.35 0.34 f1 f3 f1 q12- eating_problem 0.36 - s 4 0.550 0.040 0.050 0.040 0.550 0.54 0.07 f1 f3 f1 q13- 0.20 stay_activities 2 0.470 0.360 0.240 0.560 0.040 0.56 0.48 f1 f2 f1 0.54 - - q14-mealtime 2 0.020 0.570 0.520 0.100 0.060 0.08 0.58 f2 f1 f2 q15- 0.10 do_what_asked 8 0.550 0.370 0.220 0.670 0.020 0.64 0.5 f1 f2 f1 0.47 - - - q16-active 2 0.780 0.270 0.310 0.520 0.350 0.77 0.12 f1 f2 f1 0.54 q17-sleep 3 0.270 0.190 0.210 0.100 0.220 0.31 0.25 f1 f3 f1 0.31 - q18-needs 3 0.060 0.660 0.710 0.120 0.230 0.17 0.68 f2 f1 f2 0.40 - - q19-feelings 5 0.060 0.730 0.700 0.030 0.030 0.05 0.76 f2 f1 f2 q20- 0.19 move_activity 7 0.660 0.170 0.080 0.570 0.210 0.71 0.31 f1 f2 f1 q21-explore 0.54 0.000 0.540 0.550 - 0.070 0.09 0.57 f2 f1 f2

7 4 0.020 0.25 - q22-do_over 7 0.690 0.110 0.030 0.020 0.720 0.67 0.04 f1 f3 f1 0.00 q23-hurt 0 0.750 0.030 0.130 0.130 0.700 0.7 0.2 f1 f3 f1 q24- 0.16 - follow_rules 3 0.600 0.260 0.070 0.820 0.060 0.67 0.39 f1 f2 f1 0.19 - q25-destroy 7 0.790 0.000 0.060 0.550 0.360 0.82 0.16 f1 f2 f1 0.51 q26-stay_away 2 0.360 0.200 0.100 0.440 0.010 0.4 0.28 f1 f2 f1 q27- 0.47 - show_concern 9 0.210 0.460 0.300 0.530 0.200 0.3 0.53 f2 f2 f2 q28- 0.14 like_your_child 4 0.360 0.520 0.430 0.420 0.050 0.45 0.62 f2 f1 f2 q29- 0.06 play_children 9 0.030 0.740 0.720 0.060 0.040 0.15 0.78 f2 f1 f2 0.33 - q30-hurt_adults 4 0.710 0.010 0.120 0.700 0.140 0.73 0.16 f1 f2 f1 0.25 q31-take_turns 2 0.620 0.210 0.050 0.730 0.030 0.67 0.34 f1 f2 f1 0.54 - - - q32-sexual 6 0.590 0.290 0.390 0.580 0.090 0.53 0.16 f1 f2 f1

Proportion var 0.14 0.24 0.19 0.17 0.18 0.12 0.26 0.22 Cumulative 0.43 0.47 0.48 Note: The item contents were shortened for display. Lastly, the EGA returned 6 clusters, as presented in Figure 2. This method is part of the network psychometrics and has recently gained visibility. Its interpretation is directly related to its statistical method. Edges here correspond to partial correlation coefficients between two variables after conditioning on all other variables in the network and a latent causal mechanism is not entirely necessary (Schmittmann et al., 2013).

8 Figure 2. EGAnet results. Discussion The increasing use of psychological tests in the last few decades would surprise anyone who has lived in the Galton and Cattell era. At that time, these tools were almost entirely based on sensory discrimination and reaction time, with no to little appeal for its routine use (Gregory, 2014). This growth is partly explained because tests come with the endeavor to produce evidence-based results with regard to the psychological phenomenon, as long as it provides a solid set of evidence about its nature and relationship with other characteristics (Bornstein, 2017). If this attempt succeeds well, its achievements will pave the route to develop, modify, or even suppress inconsistent psychological theories. Another almost exponential growth in psychometrics and statistics is the availability of algorithms and computational strategies to analyze the internal structure of a psychological test. We have quickly moved from the seminal publication of Charles Spearman in 1904 to a new reality in which personal computers are cheap and widely available to researchers and students. This technological revolution also brought an almost infinite amount of possibilities of statistical analysis (Revelle, 2015), with a dynamic and vivid forum of discussion with regard to best-practices and guidelines. However, one of the implicit points of view we present herein is that despite the value that the computational methods have, its capacity to deal with (or solve) questions about the nature of psychological phenomena shapes a logarithm function, in which the vertical asymptote was reached. With that said, this research sought to present and discuss several statistical solutions from a previously published psychometric study. The data were derived from a large-scale assessment program in which the ASQ:SE was used with children aged 60 months as a screening test for identifying developmental delays in social and emotional aspects. While the previous work evidenced a multifactorial solution formed of two main factors (labeled as social and emotional), in the current study we have shown that a plethora of alternative solutions may arise when changing the statistical method. We reproduced the published results, but also dug

9 into alternative models such as PCA and NA, in addition to estimating competing solutions within the EFA structure. The study of the internal structure of ASQ:SE is valuable as it summarizes the response patterns, making it possible to understand the hidden psychological process that drives the responses (Rios & Wells, 2014). When studying the internal structure of a psychometric instrument, we are also exploring the underlying psychological constructs. From a perspective which considers an underlying causal mechanism responsible for the shared variance of the indicators, EFA is a suitable method to unveil the nature and number of the factors, as well as the relationships among them (Bandalos & Finney, 2018; Preacher et al., 2013; Woods & Edwards, 2007). Although EFA is considered a data-driven approach, there is not much room for believing that such methods are capable of solving most problems related to the psychological process without the need for deliberate reasoning. Some still argue in favor of using an “unrestricted model” instead of EFA. One of the most difficult actions when conducting a factor analysis is to decide how many factors should be retained. From a purely statistical perspective, the multiple and sometimes opposite stopping rules which arise from different methods unveil the uncertainty within this strategy. We have shown here that the computational solution is a weak candidate to enable understanding psychological data when not guided by the objectives or by the reason that motivated the test development. No extra effort is required with the ASQ:SE data in order to conclude that statistical solutions varying from 1 to 4 factors should be taken with caution. The Hull method was the only one which achieved a unidimensional solution. Despite the inclusion of all items of the ASQ:SE, their factor loadings even dropped to 0 when the item explored the child's self- injurious behaviour. As the greater the loading, the better the item discrimination, this solution allows to conclude that being too friendly with strangers has the strongest correlation with this factor. The two-dimensional solution was obtained through the elbow criterion. In summary, this rule consists in identifying the biggest drop in eigenvalues. Its simplicity comes at the cost of subjective interpretation and harsh critics. Not surprisingly, the solution obtained from PCA and FA methods completely agreed in terms of placing the item in its factor with the greater loading, with results reproducing some previously published evidence (Anunciação, Chen, et al., 2019; Anunciação, Squires, et al., 2019). The inflation in the PCA loadings compared to EFA loadings occurs as a consequence of the variance management. As said in the beginning, from the EFA perspective, the variance can be partitioned into two types (common and unique), and its models work with a reduced correlation matrix. On the other hand, PCA assumes that all of each variable’s variance is common variance, and consequently “includes” both common and unique variance in the analysis (Santos et al., 2019). As previously published, the two-dimensional EFA solution met the statistical requirement, and also enhanced the interpretability of the results. Previous evidence suggested that behaviors such as “play with other children”, “explore new places”, and demonstrate “interest in things around” could reflect social skills, whereas being able “to settle down after exciting activity”, “purposeful hurt”, and “clinging more than is expected” could reflect emotional skills. As these two competencies are correlated, overlaps may occur (Anunciação & Squires et al., 2019). Regarding the PA results (in which three factors were suggested), some previous evidence has shown slightly dissonant guides. At the same time that PA is considered the most accurate methodology for determining the number of factors to retain, others have concluded

10 that overfactoring may be also present (Garrido et al., 2013; Glorfeld, 1995; Steger, 2006). In this study, the third factor that emerged from the PA brought together items related to the child’s clinging more than was expected, as well as self-regulation, sleeping problems with the presence of tantrums, and eating problems. The last statistical method carried out was the EGA. Its clusters varied in terms of the item quantity and theoretical perspective. Tantrums while sleeping and being able to independently calm down when upset formed the group with the fewest items (in purple in figure 2). On the other hand, items related to pervasive symptoms and aggressiveness grouped together into a specific cluster (in red), which is also true for items related to affective behavior and communication (in yellow). Some limitations are present in this study. From the statistical point of view, as the results were not based on a dataset with a known data processing generation (such as a simulated dataset), but rather from an empirically derived dataset, the comparison between solutions cannot be contrasted with a true model. Another condition is the widely recognized assumption that different exploratory models within statistics will often give different outputs, and therefore their conclusions are expected to vary. In addition, the results could have changed if the chosen extraction method and the rotation strategy were modified. On the other hand, this kind of data is what is dealt with by psychometricians most of the time, with no privilege to check the true data-generation process. Therefore, despite the clear limitations, we introduced a common scenario experienced when the study of the internal structure of a test is deemed. The analysis of these results opens an avenue for several endpoints, but no one denies the close relationship between statistics and psychometrics or the importance that statistics and quantitative methods have in Psychology. Some authors even suggest that this relationship might work similarly to the Front-end vs. Back-end hierarchy. While some psychometric conceptualizations such as reliability can be defined in its Front-end and conceptual perspective as the degree to which measurement scores stay consistent or are repeatable, its Back-end is often produced by a set of correlations among the observed variables within a psychological tool. This same scenario occurs when psychologists name as domain or constructs a set of common variance between items (Coetzee & Van der Merwe, 2010; Furr & Bacharach, 2014).

However, when it comes to studies of the internal structure of a test, this pseudo- hierarchy melts away to reveal that the conceptualization of exploratory factor analysis from a pure mechanical and theoretical perspective does not resemble the reality of psychometrics. This last pursuit is a routine analysis for psychometricians, with results not only achieving major importance from methodological perspectives, but also from its theoretical basis. The multiple answers derived from different procedures and techniques can be seen as a reinforcement of the researcher degrees of freedom, but also as a key for enabling to purify the signal-to-noise ratio, thereby giving the opportunity to understand the data in line with the theories used as a driving force to develop the test. However, this is a long-standing debate in statistics and psychometrics, and its end is still far from being attained.

Conclusion Little is known in psychological assessment about the long route that researchers need to follow between conceptualizing a new tool (such as a test or an inventory) and its clinical use. When an instrument is conceptualized, a set of studies integrating theoretical assumptions, statistical requirements, and psychometric studies must be carried out on one of its outcomes to gather evidence that this tool assesses about what is intended, and to produce reliable and stable results. In the absence of psychometric studies, the results obtained by this new tool lack

11 scientific interpretation, are limited, and should not be used (AERA et al., 2014). One of the main reasons for this is because psychologists assume that the responses obtained through a psychological tool may offer the grounds to access latent variables, also known as constructs, and add information for a more precise clinical judgment. One fundamental aspect of this route is to analyze the internal structure of a psychological test. At the same time that this endeavor joins statistics and psychometrics, it also nurtures a reality in which different perspectives are floated as the basis to the decisional process. In this manuscript, we sought to demonstrate this environment within the data gathered by the ASQ:SE, and also to contribute to boosting the discussion between the use of factor analysis and network analysis in psychometrics. Finally, two main conclusions are suggested in this study. First, statistical assumptions are broken on some occasions in psychometrics in order to enhance the signal-to-noise ratio and make it possible to analyze the results gathered by a psychological tool through a theoretical perspective. Second, we tried to demonstrate that even when statistical assumptions are met, the use of its results could be done in a convenient fashion. We describe these two conditions from a prospective framework, in which the development of science is the development of measures and time will reveal its usefulness. References

AERA, APA, & NCME. (2014). Standards for educational and psychological testing. American Educational Research Association. Anunciação, L., Chen, C.-Y., Pereira, D. A., & Landeira-Fernandez, J. (2019). Factor Structure of a Social-Emotional Screening Instrument for Preschool Children. Psico-USF, 24(3), 449–461. https://doi.org/10.1590/1413-82712019240304 Anunciação, L., Squires, J., Clifford, J., & Landeira-Fernandez, J. (2019). Confirmatory analysis and normative tables for the Brazilian Ages and Stages Questionnaires: Social- Emotional. Child: Care, Health and Development. https://doi.org/10.1111/cch.12649 Baghdarnia, M., Soreh, R. F., & Gorji, R. (2014). The Comparison of Two Methods of Maximum Likelihood (ML) and Diagonally Weighted Least Squares (DWLS) in Testing Construct Validity of Achievement Goals. Journal of Educational Management Studies. Bandalos, D. L., & Finney, S. J. (2018). Factor Analysis. In The Reviewer’s Guide to Quantitative Methods in the Social Sciences (pp. 98–122). Routledge. https://doi.org/10.4324/9781315755649-8 Bornstein, R. F. (2017). Evidence-Based Psychological Assessment. Journal of Personality Assessment, 99(4), 435–445. https://doi.org/10.1080/00223891.2016.1236343 Box, G. E. P. (1976). Science and Statistics. Journal of the American Statistical Association, 71(356), 791. https://doi.org/10.2307/2286841 Braeken, J., & van Assen, M. A. L. M. (2017). An empirical Kaiser criterion. Psychological Methods, 22(3), 450–466. https://doi.org/10.1037/met0000074 Chen, C., Filgueiras, A., Squires, J., & Landeira-fernandez, J. (2016). EXAMINING THE FACTOR STRUCTURE OF AN EARLY CHILDHOOD SOCIAL EMOTIONAL SCREENING ASSESSMENT. Journal of Special Education and Rehabilitation, 17(3–4), 89–104. https://doi.org/10.19057/jser.2016.12 Coetzee, S., & Van der Merwe, P. (2010).  Industrial psychology students’ attitudes towards statistics. SA Journal of Industrial Psychology, 36(1). https://doi.org/10.4102/sajip.v36i1.843

12 Contreras, A., Nieto, I., Valiente, C., Espinosa, R., & Vazquez, C. (2019). The Study of Psychopathology from the Network Analysis Perspective: A Systematic Review. Psychotherapy and Psychosomatics, 88(2), 71–83. https://doi.org/10.1159/000497425 Costello, A., & Osborne, J. (2011). Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment, Research & Evaluation, 10, np. Epskamp, S., Maris, G., Waldorp, L. J., & Borsboom, D. (2017). Network psychometrics. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test Development. https://doi.org/10.1002/9781118489772.ch30 Forbes, M. K., Wright, A. G. C., Markon, K. E., & Krueger, R. F. (2017). Evidence that psychopathology symptom networks have limited replicability. Journal of Abnormal Psychology, 126(7), 969–988. https://doi.org/10.1037/abn0000276 Forero, C. G., Maydeu-Olivares, A., & Gallardo-Pujol, D. (2009). Factor Analysis with Ordinal Indicators: A Monte Carlo Study Comparing DWLS and ULS Estimation. Structural Equation Modeling: A Multidisciplinary Journal, 16(4), 625–641. https://doi.org/10.1080/10705510903203573 Franco, G. Di, & Marradi, A. (2013). Factor analysis and principal component analysis. FrancoAngeli. Furr, R. M., & Bacharach, V. R. (2014). Psychometrics: An Introduction Second Edition. SAGE Publications. Garrido, L. E., Abad, F. J., & Ponsoda, V. (2013). A new look at Horn’s parallel analysis with ordinal variables. Psychological Methods, 18(4), 454–474. https://doi.org/10.1037/a0030005 Glorfeld, L. W. (1995). An Improvement on Horn’s Parallel Analysis Methodology for Selecting the Correct Number of Factors to Retain. Educational and Psychological Measurement, 55(3), 377–393. https://doi.org/10.1177/0013164495055003002 Golino, H. F., & Epskamp, S. (2017). Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research. PLOS ONE, 12(6), e0174035. https://doi.org/10.1371/journal.pone.0174035 Gregory, R. J. (2014). Psychological Testing: History, Principles and Applications. Allyn and Bacon. Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2014). Multivariate Data Analysis Seventh Edition. In Pearson New International. Kim, H. J. (2008). Common factor analysis versus principal component analysis: Choice for symptom cluster research. Asian Nursing Research. https://doi.org/10.1016/S1976- 1317(08)60025-0 Lorenzo-Seva, U., & Ferrando, P. J. (2006). FACTOR: A computer program to fit the exploratory factor analysis model. Behavior Research Methods, 38(1), 88–91. https://doi.org/10.3758/BF03192753 Lorenzo-Seva, U., Timmerman, M. E., & Kiers, H. A. L. (2011). The hull method for selecting the number of common factors. Multivariate Behavioral Research, 46(2), 340–364. https:// doi.org/10.1080/00273171.2011.564527 Macciotta, N. P. P., Cecchinato, A., Mele, M., & Bittante, G. (2012). Use of multivariate factor analysis to define new indicator variables for milk composition and coagulation properties in Brown Swiss cows. Journal of Dairy Science, 95(12), 7346–7354.

13 https://doi.org/10.3168/jds.2012-5546 Mukherjee, S. P., Sinha, B. K., & Chattopadhyay, A. K. (2018). Factor Analysis. In Statistical Methods in Social Science Research (pp. 103–111). Springer Singapore. https://doi.org/10.1007/978-981-13-2146-7_10 Norris, M., & Lecavalier, L. (2010). Evaluating the Use of Exploratory Factor Analysis in Developmental Disability Psychological Research. Journal of Autism and Developmental Disorders, 40(1), 8–20. https://doi.org/10.1007/s10803-009-0816-2 Preacher, K. J., & MacCallum, R. C. (2003). Repairing Tom Swift’s Electric Factor Analysis Machine. Understanding Statistics. https://doi.org/10.1207/s15328031us0201_02 Preacher, K. J., Zhang, G., Kim, C., & Mels, G. (2013). Choosing the Optimal Number of Factors in Exploratory Factor Analysis: A Perspective. Http://Dx.Doi.Org/10.1080/00273171.2012.710386. R Development Core Team. (2020). R Development Core Team, R: a language and environment for statistical computing. In R: A Language and Environmental for Estatistical Computing. Revelle, W. (2015). Package “psych” - Procedures for Psychological, Psychometric and Personality Research. R Package, 1–358. http://personality-project.org/r/psych-manual.pdf Rios, J., & Wells, C. (2014). Validity evidence based on internal structure. Psicothema. https://doi.org/10.7334/psicothema2013.260 Santos, R. de O., Gorgulho, B. M., Castro, M. A. de, Fisberg, R. M., Marchioni, D. M., & Baltar, V. T. (2019). Principal Component Analysis and Factor Analysis: differences and similarities in Nutritional application. Revista Brasileira de Epidemiologia, 22. https://doi.org/10.1590/1980-549720190041 Schmittmann, V. D., Cramer, A. O. J., Waldorp, L. J., Epskamp, S., Kievit, R. A., & Borsboom, D. (2013). Deconstructing the construct: A network perspective on psychological phenomena. New Ideas in Psychology, 31(1), 43–53. https://doi.org/10.1016/j.newideapsych.2011.02.007 Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in and analysis allows presenting anything as significant. Psychological Science. https://doi.org/10.1177/0956797611417632 Squires, J., Bricker, D., Heo, K., & Twombly, E. (2001). Identification of social-emotional problems in young children using a parent-completed screening measure *. Early Childhood Research Quarterly, 16(4), 405–419. https://doi.org/10.1016/S0885- 2006(01)00115-6 Steger, M. F. (2006). An Illustration of Issues in Factor Extraction and Identification of Dimensionality in Psychological Assessment Data. Journal of Personality Assessment, 86(3), 263–272. https://doi.org/10.1207/s15327752jpa8603_03 Stigler, S. M. (1992). A Historical View of Statistical Concepts in Psychology and Educational Research. American Journal of Education. https://doi.org/10.1086/444032 Tabachnick, B. G., & Fidell, L. S. (2001). Tabachnick, Fidell_2001.pdf. In Using Multivariate Statistics (p. 1008). http://scholar.google.com/scholar? hl=en&btnG=Search&q=intitle:Cleaning+up+your+act: +Screening+Data+Prior+to+Analysis#0 van Bork, R., Rhemtulla, M., Waldorp, L. J., Kruis, J., Rezvanifar, S., & Borsboom, D. (2019). Latent Variable Models and Networks: Statistical Equivalence and Testability.

14 Multivariate Behavioral Research, 1–24. https://doi.org/10.1080/00273171.2019.1672515 Wigboldus, D. H. J., & Dotsch, R. (2016). Encourage Playing with Data and Discourage Questionable Reporting Practices. Psychometrika, 81(1), 27–32. https://doi.org/10.1007/s11336-015-9445-1 Woods, C. M., & Edwards, M. C. (2007). 12 Factor Analysis and Related Methods (pp. 367– 394). https://doi.org/10.1016/S0169-7161(07)27012-9 Wright, D. B. (2009). Ten Statisticians and Their Impacts for Psychologists. Perspectives on Psychological Science, 4(6), 587–597. https://doi.org/10.1111/j.1745-6924.2009.01167.x Yang-Wallentin, F., Joreskog, K., & Luo, H. (2010). Confirmatory Factor Analysis of Ordinal Variables With Misspecified Models. Structural Equation Modeling: A Multidisciplinary Journal, 17(3), 392–423. https://doi.org/10.1080/10705511.2010.489003

15