<<

PSYCHOLOGICAL SCIENCE

Research Article Scholastic Assessment or g? The Relationship Between the Scholastic Assessment Test and General Cognitive Ability Meredith C. Frey and Douglas K. Detterman

Case Western Reserve University

ABSTRACT—There is little evidence showing the relationship be- Testing Service (ETS), the group that administers the SAT along with tween the Scholastic Assessment Test (SAT) and g (general in- countless other standardized tests for the College Board, began a telligence). This research established the relationship between program of validity research with the SAT, known as the Validity Study SAT and g, as well as the appropriateness of the SAT as a Service (VSS). This service, offered free to interested colleges, com- measure of g, and examined the SAT as a premorbid measure of putes the average multiple correlation between three predictor vari- . In Study 1, we used the National Longitudinal ables (SAT math score, SAT verbal score, and high school record) and Survey of Youth 1979. Measures of g were extracted from the grade point average (GPA) among college freshmen. Between the years Armed Services Vocational Aptitude Battery and correlated with 1964 and 1985, the average correlation between SAT score and SAT scores of 917 participants. The resulting correlation was freshman GPA was in the range from .38 to .46 (Morgan, 1990). De- .82 (.86 corrected for nonlinearity). Study 2 investigated the spite this history of analysis, and the that g is one of the correlation between revised and recentered SAT scores and most reliable predictors of academic achievement, ETS refuses to cite scores on the Raven’s Advanced Progressive Matrices among any SAT-IQ correlations (Seligman, 1994). The College Board presi- 104 undergraduates. The resulting correlation was .483 (.72 dent, Gaston Caperton, was recently quoted in an on-line publication corrected for restricted range). These studies indicate that the of U.S. News & World Report as admitting that ‘‘in its original form [the SAT is mainly a test of g. We provide equations for converting SAT] was an IQ test’’ (Barnes, 2002). However, the people responsible SAT scores to estimated IQs; such conversion could be useful for for developing and administering the SAT insist that, once logic items estimating premorbid IQ or conducting individual difference were replaced by reading comprehension in 1946, the SAT ceased to research with college students. measure intelligence (Barnes, 2002). Rather, the College Board cur- rently asserts that the SAT measures reasoning ability, which they hold as different from intelligence. The relationship between academic achievement and general intelli- The unusual distinction made by ETS and the College Board not- gence, or g, is well established in the field of intelligence (Jensen, withstanding, some researchers have voiced support for the SAT as a 1998). One particular test of academic achievement, however, is de facto intelligence test. In his 1994 book, A Question of Intelligence, conspicuously absent from these studies. The SAT (once called the Seligman noted, ‘‘Henry Chauncey, a former president of ETS has Scholastic Aptitude Test, and now the Scholastic Assessment Test), an been quoted (in Klitgaard, Choosing Elites, 1985, p. 92) as stating that exam that generations of high school juniors and seniors have ago- the SAT is essentially an intelligence test’’ (pp. 206–207). In fact, the nized over and that has spawned innumerable preparatory books and author of the original SAT, Carl Campbell Brigham, developed his test classes, has not, surprisingly enough, been frequently analyzed in from Army intelligence tests to more accurately differentiate among conjunction with measures of general cognitive ability. individuals at the upper end of the scale (Lemann, 1999). Although That is not to say that there is a lack of research in general on the the College Board asserts that the test is substantially different from SAT. In fact, to the contrary, the SAT’s relation to college admissions its original form, Lemann asserted that, other than having its language and achievement has been the focus of much research. A study of updated, the SAT has changed little since its original version. 25,000 applicants at nine private colleges yielded unsurprising re- Furthermore, empirical evidence suggests a substantial relationship sults: Two thirds of the variance in predicting admissions decisions between the SAT and g. In a study of 339 undergraduates, Brodnick could be accounted for by academic factors, split equally between and Ree (1995) used covariance structure modeling to examine the high school record and admissions test scores, including SAT scores relationship between psychometric g, socioeconomic variables, and (Willingham & Breland, 1982). Furthermore, in 1964, the Educational achievement-test scores. They found substantial general-factor load- ings on both the math (.698) and the verbal (.804) SAT subtests. Address correspondence to D.K. Detterman, Department of Psy- Because they used the SAT to develop their measure of g, it is not chology, Case Western Reserve University, Cleveland, OH 44106; clear if this general factor is the same as that obtained from standard e-mail: [email protected]. intelligence tests. If the general factors are indeed the same, then the

Volume 15—Number 6 Copyright r 2004 American Psychological Society 373 Scholastic Assessment Test and g

SAT may have been overlooked as a potentially useful measure of TABLE 1 general cognitive functioning. Factor Loadings From the of the Armed It would be of considerable benefit for researchers and clinicians to Services Vocational Aptitude Battery have an equation for estimating an individual’s IQ from an objective measure such as the SAT. In cases of suspected cognitive decline due Subtest Factor 1 Factor 2 to injury or illness, clinicians currently use a series of demographic General Science .881 .116 variables to predict their clients’ premorbid IQ. For many of these Arithmetic Reasoning .858 .045 individuals, the SAT may be the only objective measure of premorbid Word Knowledge .885 .071 functioning. The method of predicting IQ with demographic variables Paragraph Comprehension .825 .175 carries with it a standard error of prediction (SEp) of 11.4 (Karzmark, Numerical Operations .724 .426 Heaton, Grant, & Matthews, 1985). It appears that there is ample room Coding Speed .657 .417 for improving the accuracy with which clinicians currently estimate Auto and Shop Information .727 .425 premorbid IQ. Mathematics Knowledge .800 .129 In addition to the obvious clinical application, there is a substantial Mechanical Comprehension .794 .320 benefit to researchers’ use of a regression equation to estimate IQ. Electronics Information .829 .320 Much research in the field of psychology is carried out on under- graduate volunteers. Although there has been a recent push by the the ASVAB are shown in Table 1. They indicate a substantial loading University of California system to discontinue using SAT scores as an of all subtests of the ASVAB on a first factor, g. admission criterion, many colleges and universities continue to re- ASVAB first-factor scores were transformed to an IQ scale using the quire the SAT for admission (Barnes, 2002). Consequently, there is a following equation: IQ 5 (z 15)1100. Finally, the IQ scores derived vast pool of potential subjects who already have a good measure of from the ASVAB were correlated with SAT scores for the 917 re- their general cognitive functioning in their transcripts. If a researcher spondents who had scores on both measures. Simple correlations could estimate IQ with sufficient accuracy from the scores on record, it between both SAT scores and ASVAB IQ scores and scores on the would eliminate hours of additional testing in cases in which measures small-N intelligence tests were also analyzed. of general intelligence are required. To these ends, we examined the relationship between the SAT and general cognitive ability in two studies. We also developed regression equations to estimate IQ from Results the SAT. Of the 11,878 subjects who had scores on all ASVAB subtests, 917 also took the SAT and were the primary focus in this study. There was a significant correlation between SAT and ASVAB IQ (r 5 .820, STUDY 1 p < .001). However, when a scatter plot of the relationship was examined (see Fig. 1a), a nonlinear relationship was evident. To correct for Method nonlinearity, we added the squared SAT score (R5.857, p < .001) to Participants the regression. A cubic component of SAT also added significantly to 5 Data were extracted from the National Longitudinal Survey of Youth the prediction of IQ (p .011), but the added variance was so small 1979 (NLSY79) data set. The NLSY79 contained a national proba- that we decided to not include it in the regression equation. Figure 1b bility sample of civilian and military subjects aged 14 to 21, with an presents a scatter plot of IQ predicted from a regression equation with overrepresentation of certain minority groups. A total of 12,686 men the squared and cubic components of SAT versus IQ obtained from the and women participated in the NLSY79. first factor of the ASVAB. Entering math and verbal subtest scores into the regression separately, as opposed to using a composite score, did not improve the prediction of IQ. Procedure The simple correlations between SAT and IQ (or whatever measure From the NLSY79 data set, we extracted measures of performance on was reported for the IQ test) for small-N intelligence tests ranged from the Armed Services Vocational Aptitude Battery (ASVAB) and the .53 to .82 for participants who had taken the SAT and also had another SAT (known as the Scholastic Aptitude Test at that ), as well as a IQ test score reported in their school records. All simple correlations number of intelligence tests that did not have scores available for as between tests are summarized in Table 2. The SAT correlated signif- many subjects (‘‘small-N’’ tests). icantly (p < .05) with all six of the traditional intelligence tests ex- A principal-axis factor analysis was performed on the 10 subtests of amined (see the SAT column in Table 2), although these results must the ASVAB in order to derive a measure of g. This analysis included be interpreted with a degree of caution as some of the ns are quite 11,878 of the 11,914 subjects who had taken the ASVAB. The re- small. As the correlations in the ASVAB column indicate, the SAT maining 36 subjects were excluded because they were missing scores and the other six intelligence tests all correlated significantly with for 1 or more subtests. ASVAB scores were chosen in place of scores IQ derived from the first-factor score of the ASVAB (r 5.56–.83, on some of the more traditional intelligence tests because ASVAB p < .01). The SAT correlated more highly with the ASVAB first-factor scores were available for nearly all of the NLSY79 participants score (r 5 .82, p < .01) than with any other measure except for the (11,878 of 12,686). Furthermore, prior analysis of the ASVAB con- Coop School & College Test (r5.83, p < .01). This is strong evidence firmed a hierarchical g model in which 64% of the variance in the that the SAT is an intelligence test. ASVAB was due to a general factor (Ree & Carretta, 1994; see Roberts Based on the regression corrected for nonlinearity with the addition et al., 2000, for an alternative model). Results of the factor analysis of of the squared SAT component, we developed the following equation to

374 Volume 15—Number 6 Meredith C. Frey and Douglas K. Detterman

Fig. 1. Scatter plots of Scholastic Assessment Test (SAT) scores and IQ estimates: first-factor score (IQ scale) from the Armed Services Vocational Aptitude Battery (ASVAB) as a function of (a) SAT total score and (b) unstandardized predicted IQ based on SAT total score, SAT2, and SAT3 and (c) Raven’s Advanced Progressive Matrices score (IQ scale) as a function of SAT total score. predict IQ from SAT scores: tomary means of predicting IQ using a series of demographic variables. 0 ¼ð : Þþð : 2Þþ : ð Þ As mentioned earlier, the SEp using demographic variables is 11.4 XIQ 0 126 SAT 4 71E 5 SAT 40 063 1 (Karzmark et al., 1985), about twice as large as the SEp using the SAT.

The standard error of prediction (SEp) was 5.94. This process of esti- In order to test the equation for consistent prediction, we used a mating IQ from SAT scores appears much more accurate than the cus- jackknife procedure. A regression equation developed on a random

TABLE 2 Simple Correlations Among Tests

Test Coop Armed California Lorge- Henmon- School Services Test of Otis-Lennon Thorndike Nelson Test Differential & College Scholastic Vocational Mental Mental Intelligence of Mental Aptitude Ability Aptitude Aptitude Test Maturitya Ability Testa Testa Maturitya Testa Testa Test Batteryb Total n California Test .76nn .77 .88nn .58nn .89nn .82nn .78nn 599 of Mental (12) (6) (7) (25) (19) (31) (358) Maturitya Otis-Lennon .86nn .53 .74nn .61 .78nn .76nn 1,191 Mental Ability (27) (11) (85) (5) (79) (572) Testa Lorge-Thorndike .38 .55nn .49n .79nn .56nn 691 Intelligence (12) (64) (17) (29) (295) Testa Henmon-Nelson .53 .86nn .65nn .69nn 201 Test of Mental (7) (19) (15) (166) Maturitya Differential .77nn .78nn .75nn 569 Aptitude Testa (28) (68) (600) Coop School & .53n .83nn 164 College Ability (15) (162) Testa Scholastic .82nn 948 Aptitude Test (917) Armed Services 11,914 Vocational Aptitude Batteryb

Note. Numbers in parentheses indicate the n for each correlation. aCorrelations performed on percentile score. bFirst-factor scores from factor analysis of the subtests. np < .05. nnp < .01.

Volume 15—Number 6 375 Scholastic Assessment Test and g

sample of approximately 50% of the Study 1 (NLSY79) data was used in Figure 1c shows the simple correlation between SAT total scores to predict IQ in the remaining half of the sample. A second regression and IQ as obtained from the Raven’s APM. The mean SAT score of the equation was developed on the second half of the sample and used to sample in Study 1 was 854, whereas the mean SAT score of the sample predict IQ in the first half. The correlation between IQ predicted from in Study 2 was 1372. Furthermore, the standard deviation of the the first-half regression equation and IQ extracted from the ASVAB in sample in Study 2 was 119, and the standard deviation obtained in the second half of the data was .856 (p < .01). The correlation be- Study 1 was 226 (the standardization of the SAT had a standard de- tween IQ predicted from the regression equation developed on the viation of 200). These comparisons indicated a restricted range of the second half of the sample and IQ extracted from the ASVAB in the first Study 2 participants. We therefore corrected the Study 2 simple corre- half of the data was .860 ( p < .01). lation between SAT and Raven’s IQ (r 5.483) for restriction of range, simply to obtain an estimate of the correlation in a less restricted sample of college students. The resulting corrected correlation was Discussion .72. In addition, it is evident from Figure 1c that there is a ceiling on It is evident from these results that there is a striking relation between the Raven’s scores, which may have also suppressed the correlation. SAT scores and measures of general cognitive ability. In fact, when In contrast to what we found in Study 1, entering SAT math (SAT-M) one examines the results in Table 2, especially those in the ASVAB and SAT verbal (SAT-V) scores into the regression equation in place of column, it appears that the SAT is a better indicator of g, as defined by a composite, SAT total score, slightly improved the prediction of IQ. the first factor of the ASVAB, than are some of the more traditional The multiple R for SAT-M and SAT-V entered separately was .554 intelligence tests. Furthermore, jackknife procedures indicate highly (p < .001), or .745 when corrected for restriction of range. consistent results are obtained with Equation 1. We developed an equation for predicting IQ from SAT scores in the Since 1979, when the data used in this analysis were collected, the Study 2 data set. The resulting equation includes SAT-M and SAT-V SAT has undergone revision and recentering. Therefore, Equation 1 scores, in place of the total score, as neither the squared nor the cubic may not be applicable for a younger subject population. In Study 2, we component of SAT added significantly to the prediction of IQ (p > evaluated the usefulness of this equation for the revised and recen- .05). The following is the equation obtained: tered SAT. 0 ¼ð : Þþð : Þþ : ð Þ XIQ 0 095 SAT-M 0 003 SAT-V 50 241 2 STUDY 2 The standard error of prediction (SEp) was 9.76. Although this stan- Method dard error is higher than what was obtained in Study 1, it is none- theless more accurate than the 11.4 one would expect estimating IQ Participants using demographic variables. Part of the for the larger standard One hundred sixteen students from a private university were recruited error is the restricted range of Study 2. through the psychology subject pool. Of this total, 65 males and 39 Repeating the jackknife procedure used in Study 1 on the Study 2 females were included in the analysis; the remaining 12 students did data set (i.e., using equations developed on one half of the data to not have valid SAT scores on record at the university. predict IQ in the second half) yielded correlations of .523 (equation developed on first half) and .542 (equation developed on second half), Procedure respectively (both significant at the .01 level). These correlations Subjects were administered the Raven’s Advanced Progressive Ma- indicate that results obtained using Equation 2 are stable. trices (APM; 1962 revision; J.C. Raven, 1978), a test of nonverbal reasoning skills. The test was administered to subjects in groups of 10 to 25 individuals, and the sessions were untimed. Following test ad- Cross-Validation ministration, all participants’ SAT scores were obtained from college Both studies showed highly consistent results even when a jackknife admissions records. procedure was used. There was very little shrinkage in the correlations Raven’s APM scores were converted to percentiles using 1993 when regression equations were applied to new samples. This finding smoothed detailed U.S. norms (J. Raven, Raven, & Court, 1998). Z raised the question of whether it would be possible to develop a single scores were computed from the percentiles, then transformed to an IQ equation that would apply to both samples. To test this, we applied the scale with the same equation used in Study 1. IQs obtained in this equation from Study 1 to the data of Study 2 and the equation from manner were identical (within rounding error) to those obtained using Study 2 to the data of Study 1. If one equation can apply to both data Table APM36 of the Raven’s APM manual, which converts percentiles sets, then there should be very little shrinkage in correlations when to IQ (J. Raven et al., 1998, p. APM 102). Following the conversion to regression equations are applied across the studies. an IQ scale, we calculated a simple correlation between the Raven’s The equations developed from both studies were validated across APM and SAT total scores for the 104 subjects who had valid scores the data sets. Equation 1, developed on the NLSY79 data set, was on both measures. used to estimate IQ from the Study 2 SAT scores. These estimated IQs were then correlated with the Raven’s IQs of the Study 2 subjects, as well as the IQs we estimated from Equation 2. IQs predicted from Results Equation 1 using SAT scores from Study 2 correlated significantly with Preliminary box plots revealed an outlier in the data set, which was both the Raven’s IQs (r 5.481, p < .01) and the IQs predicted from excluded from further analysis. The correlation between the remaining Equation 2 (r 5.862, p < .01). Equation 2 was then used to estimate 103 SAT and APM scores, although significant, was lower than the IQ from the SAT scores of Study 1. These estimates were then cor- correlation obtained in Study 1 (r 5.483, p < .001). The scatter plot related with both ASVAB IQs (r 5 .482, p < .01) and IQs predicted

376 Volume 15—Number 6 Meredith C. Frey and Douglas K. Detterman

from Equation 1 (r 5 .526, p < .01). Not surprisingly, there was GENERAL DISCUSSION substantial shrinkage in the cross-validation. This shrinkage could indicate one of three things: First, the equation developed using the Overall, the results of these studies support two major findings. First, NLSY79 data set may not be valid, in any sample, for predicting IQ the SAT is an adequate measure of general intelligence, and second, it with the new recentered SAT scores. Second, the characteristics of the is a useful tool in predicting cognitive functioning when other esti- Study 2 sample (i.e., much higher mean and restricted range of the mates of intelligence are unavailable, too time-consuming, or too SAT scores compared with the Study 1 sample) may have resulted in costly. One implication of these results is that it might be more useful decreased effectiveness of the Study 1 equation in predicting IQ, but if the SAT were reported as a score on a general factor, plus separate the Study 1 equation might be applicable to other, less restricted math and verbal subscale scores.1 Using the regression equations samples. Third, the estimate of g used in Study 1 may be substantively presented here, SAT scores can be converted to estimates of IQ. These different from the measure of g in Study 2. estimates are especially useful in studies of college students when a rough measure of g is needed. Although it would be perfectly ac- ceptable to use SAT scores without conversion, conversion to an IQ Discussion score provides a basis for comparing studies. The comparison between Study 1 and Study 2 samples presents an Another application of the SAT-IQ conversion is as an estimate of interesting problem. Can one assume that the revised, recentered SAT premorbid intelligence, as often SAT scores are the only objective predicts g in the same manner that the previous version (used in the measure of premorbid intellectual functioning available, and the re- NLSY79 data set) did? Despite the differences in the samples and the sulting estimate of IQ is much more accurate than standard estimates best regression equations, there was still a strong correlation between based on demographic variables. We have provided two equations that the IQs predicted by the two equations. Unfortunately, the results of can be used to estimate IQ from SAT scores, depending on whether the the jackknife procedures indicate that the substantial shrinkage in scores are from before or after the 1994 recentering. The evidence cross-validation correlations across studies is due, at least in part, to presented here strongly suggests that estimates of general intellectual the inability of Equation 1 to predict IQ using the recentered SAT functioning obtained from SAT scores are accurate and acceptable, scores, although this equation is completely acceptable to use with and that the SAT-IQ conversion is to be encouraged, whether for archived data sets or individuals who took the SAT prior to the 1994 clinical application or in a research setting. recentering. Ultimately, Equation 2 is more appropriate for use with recentered scores. However, Equation 2 is likely to be most useful in predicting IQ at the high end of the distribution, and therefore must be used with caution. Acknowledgments—Parts of this work were supported by Grant No. Although Equation 2 is better suited than Equation 1 for predicting HD07176 from the National Institute of Child Health and Human high-end IQ, it is still a valuable tool. Individual difference re- Development, Office of Mental Retardation. searchers at colleges and universities will likely encounter a popu- lation similar to the one on which Equation 1 was based. Therefore, this equation will provide an alternative to time-consuming intelli- REFERENCES gence testing if only an estimate of IQ is required. Barnes, J.E. (2002, November 11). The SAT revolution: The new test spells the An alternative explanation of the differences between Equations 1 end of IQ- and big changes for American education. Retrieved November and 2 can be based on the measures of general intelligence used. The 18, 2002, from U.S. News & World Report Web site: http://www. ASVAB first factor, although often used as a surrogate for g, can be usnews.com/usnews/issue/021111/education/11sat.htm of as primarily measuring what individuals have learned, or Brodnick, R.J., & Ree, M.J. (1995). A structural model of academic perfor- crystallized intelligence (Gc; Roberts et al., 2000). In contrast, the mance, socioeconomic status, and Spearman’s g. Educational and Psy- Raven’s APM is typically characterized as a test of reasoning ability, chological Measurement, 55, 583–594. Jensen, A.R. (1998). The g factor: The science of mental ability. Westport, CT: or fluid intelligence (Gf). Therefore, Equation 1 may be likely to best Praeger. reproduce IQ as measured in tests of Gc, most notably the widely used Karzmark, P., Heaton, R.K., Grant, I., & Matthews, C.G. (1985). Use of de- Wechsler scales, whereas Equation 2 may be more useful in predicting mographic variables to predict full scale IQ: A replication and extension. IQ as measured by tests of fluid reasoning. Further support for the Journal of Clinical and Experimental Neuropsychology, 7, 412–420. Gf-SAT correlation is evident in an article by Raz, Willerman, Lemann, N. (1999). The big test: The secret history of the American meritocracy. New York: Farrar, Straus, & Giroux. Ingmundson, and Hanlon (1983), who reported a correlation of .81 Morgan, R. (1990). Analyses of the predictive validity of the SAT and high between SAT total score and another nonverbal measure of Gf, the school grades from 1976 to 1985. In W.W. Willingham, C. Lewis, R. Cattell Culture Fair Intelligence Test; however, the extreme-groups Morgan, & L. Ramist (Eds.), Predicting college grades: An analysis design of the study may have resulted in an artificially high correlation. of institutional trends over two decades (pp. 195–212). Princeton, NJ: Unfortunately, the usefulness of Equation 2 may not last long. Educational Testing Service. Currently, the SAT is being revised to shift the focus from general Raven, J., Raven, J.C., & Court, J.H. (1998). Raven Manual: Sec. 4. Advanced Progressive Matrices (1998 ed.). Oxford, England: Oxford Psychologists reasoning ability to academic achievement. The objective is to test Press. content knowledge rather than intelligence (Barnes, 2002). After this Raven, J.C. (1978). Advanced Progressive Matrices: Set II (1962 revision). overhaul of the SAT is completed in 2005, another examination of the London: H.K. Lewis & Co. relationship between SAT scores and general cognitive ability will be required to determine if the SAT will still be an adequate measure of general intelligence. We expect that it will. 1We would like to thank for this suggestion.

Volume 15—Number 6 377 Scholastic Assessment Test and g

Raz, N., Willerman, L., Ingmundson, P., & Hanlon, M. (1983). Aptitude-related Seligman, D. (1994). A question of intelligence. New York: Citadel Press. differences in auditory recognition masking. Intelligence, 7(2), 71–90. Willingham, W.W., & Breland, H.M. (1982). Personal qualities and college Ree, M.J., & Carretta, T.R. (1994). Factor analysis of the ASVAB: Confirming a admissions. New York: College Entrance Examination Board. Vernon-like structure. Educational & Psychological Measurement, 54, 459–463. Roberts, R.D., Goff, G.N., Anjoul, F., Kyllonen, P.C., Pallier, G., & Stankov, L. (2000). The Armed Services Vocational Aptitude Battery (ASVAB): Little more than acculturated (Gc)!? Learning and Individual Differ- ences, 12(1), 81–103. (RECEIVED 2/19/03; REVISION ACCEPTED 5/2/03)

378 Volume 15—Number 6