Behavior Research Methods, Instruments, & Computers 1996,28 (2),209-213

A two- composite program for combining standard scores

LARRY D. EVANS University ofArkansasfor Medical Sciences, Little Rock, Arkansas

It is often desirable for mental health practitioners to combine standard scores from different tests, raters, or times into a single composite standard score. Most often the result is a more reliable and ac­ curate standard score. This paper describes a computer program that uses two standard scores, score reliability and correlation with a third variable, to yield a composite standard score, reliability and cor­ relation. Trends, limitations, optimum benefits, and examples are discussed. References are provided for calculating composites based on more than two scores.

Composite scores are routinely used by test publishers Wang and Stanley (1970) reviewed various methods to to derive an overall scale score from two or more subtests. derive weights, but basic techniques may be based on re­ The Full Scale IQ (FSIQ) of the Wechsler Intelligence liability coefficients, test length, optimal correlation with Scale for Children-Third Edition (Wechsler, 1991), the a criterion, or subjective judgments ofthe importance of Basic Reading Skills Cluster of the Woodcock-Johnson each test's contribution to a composite. Psychoeducational Battery-Revised (Woodcock & John­ There are specific instances in which composite scores son, 1989), and the Adaptive Behavior Composite ofthe may be highly desirable. First, practitioners may give a VinelandAdaptive Behavior Scales (Sparrow,Balla, & Cic­ second test to confirm an initial test's results, then need chetti, 1984) are common examples ofcomposite scores. to combine the scores to reflect the best indication ofover­ While often used by test publishers, composite scores are all performance in a particular domain. Indeed, federal less frequently derived by mental health practitioners. and most state special education guidelines discourage This is unfortunate, as an understanding of composite reliance on a single score or procedure during decision scores enhances the interpretation of composite scores, making. The State ofArkansas, for example, requires that such as the FSIQ, and provides a tool that may be re­ after an initial achievement test has been administered to quired to synthesize psychological results. a student suspected ofhaving a learning disability, a sec­ Combining two or more same-domain scores into a sin­ ond achievement test must be given to further assess the gle composite score offers several advantages over inter­ area ofsuspected disability (Arkansas Department ofEd­ pretation ofindividual scores. First, one conceptualization ucation, 1993). Second, after assessing a subject, a prac­ of a composite score is a score from a single test com­ titioner may wish to obtain results ofa recent evaluation posed ofthe items ofseveral individual tests (Thorndike, in some other setting, either incidentally or as a blind sec­ Cunningham, Thorndike, & Hagen, 1991). This "longer" ond opinion, and to combine same-domain scores to obtain test generally results in a more reliable score. Second, com­ a more accurate picture ofthe subject's overall abilities. posite scores avoid loss of information. For example, Third, practitioners may use screening or abbreviated test when two tests measure the same domain and yield dif­ batteries, then later provide more comprehensive evalu­ ferent scores, practitioners often try to determine which ation for failed screenings. It is often desirable to then score is more accurate, often by comparing scores with combine the results ofthe screening and comprehensive an overall profile, selecting the more reliable or compre­ scores (generally weighting the comprehensive score hensive score, or relating subject behavior, during testing, more heavily) so as not to lose the time and expense re­ to each score. Regardless ofwhat single score is thought quired for the screening. Fourth, comprehensive behav­ to be the more accurate, the net effect of using only one ioral assessment may include subject ratings from sev­ test is loss ofinformation. Because the composite score eral individuals, times, or settings. A composite score is based on the scores ofboth tests, there is no sacrificing may then be needed to summarize the ratings and in­ oftest data. Third, composite scores offer great flexibility crease overall reliability. in test result interpretation. Weights can be assigned to scores on the basis ofthe purpose ofthe composite score. Statement ofthe Problem Most practitioners would benefit from being able to combine scores into a composite score. The resulting com­ A copy ofthe compiled program is available from the author upon re­ posite score would often provide a more reliable and valid ceipt ofa self-addressed, postage-paid mailer and one DOS- formatted, high-density floppy disk. Correspondence should be addressed to Larry summary score. Better understanding ofcomposite scores Evans, Dennis Developmental Center, 1612 Maryland Street, Little would also aid in interpreting the composite scores pro­ Rock, AR 72202 (e-mail: [email protected]). vided by tests. Unfortunately, the equations needed to

209 Copyright 1996 Psychonomic Society, Inc. 210 EVANS

calculate composite scores are not readily available to where SDx is the composite's standard , SDx 1and most practitioners and are not reliably performed by SDx are thCe standard scores' standard deviations, and r12 hand calculations. is the Pearson product- correlation coefficient for the standard scores. If the obtained standard scores Response to the Problem are initially expressed as z scores and equal weights of This paper describes a computer program designed to 1.00 are assigned, then Equation 3 simplifies to calculate a standard score composite based upon two standard scores. A two-score composite may be the most (4) common composite and, as the simplest composite, pro­ Equations 3 and 4 indicate that, for obtained scores with vides a basis for assessing the relative benefits ofmultiple­ the same , as the correlation increases score composites. toward 1.00, the composite standard deviation approaches COMPOSITE SCORE PROGRAM twice the obtained scores' standard deviation. The final step is to convert the composite score to a standard score with the same and standard devia­ A program was written using Version 1.0 ofMicrosoft tion as one or both ofthe obtained scores. This involves Visual Basic for MS-DOS. After an initial start-up screen first converting the composite score to a z score, then to ofinstructions, the program displays a single screen with a standard score in the same metric as one or both ofthe edit fields that allow the user to enter data for two test obtained scores. Using the above equations, the z score scores (standard score, reliability, correlation with a crite­ for the composite score is rion, and score weight) and the correlation between the two scores. After a push button is clicked, the calculations Xc - M, z = c (5) are performed and the composite standard score, its relia­ XC SD bility, and its correlation with a criterion are shown at the XC bottom ofthe screen. A single, pull-down menu allows the The z score for the composite (z, ) can be multiplied by results to be printed, entered data to be erased, or exiting the the standard deviation ofan obtained score and the prod­ program. uct then be added to the same obtained score's mean in The program is intentionally straightforward, with an order to produce a composite standard score that is in­ emphasis on application. A key to successfully using the terpretable relative to the obtained scores. program is an understanding of the equations and out­ One synthesis of the above equations is that, as the put. Therefore, the following equations and examples, correlation rl 2 increases toward 1.00, the composite stan­ which are intended to help the user understand and uti­ dard score approaches the (weighted) average of the lize the program results, are provided. obtained scores. As the correlation decreases, the com­ posite standard score increasingly deviates from this av­ Calculating the Composite Standard Score erage, with the direction of deviation being away from The first calculation involves determining a compos­ the (weighted) mean. ite score: (1) Calculating the Composite's Reliability The composite standard score, as with any standard where Xc is the composite score, bl and b2 are weights, score, should not be interpreted without taking the error and X I and x2 are the obtained standard scores. If equal ofmeasurement into account (Salvia & Ysseldyke, 1991). weights of1.00 are assigned to each obtained score, then The composite's reliability is therefore important for de­ Equation 1 simplifies to the sum ofthe obtained scores. termining the ofmeasurement and the es­ The composite score of Equation 1 is not a standard timated true composite score. The composite's reliability score in the same metric as the obtained scores, and re­ may also be important to determine if the score differs quires that the composite's mean and standard deviation significantly from another standard score (Payne & Jones, be calculated before it can be interpreted. The equation 1957) or ifthe composite score is to be entered into a soft­ for the composite score's mean is ware analysis program such as regression software for learning disabilities (e.g., Evans, 1994). The composite's (2) reliability also has meaning ifonly when compared with the reliabilities ofthe scores forming the composite. As where M is the composite mean, and M and M are xc x , X 2 will be seen, composite reliabilities that vary markedly for the obtained standard scores. The composite's from those ofits contributing scores provide information standard deviation is given by the equation about the relationship between those scores and their

r- suitability in forming a composite. 2 2 The equation for the reliability ofthe composite stan­ SDx = ,b[SD +bIsD +2b lb2 r 12SDx SDx ' (3) c ~ Xl X2 I 2 dard score is given as COMPOSITE SCORES 211

ity study in one test manual shows a correlation between (6) reading comprehension portions of both tests as .78 for a nonreferred . , The data are entered into the composite score pro­ where rx is the composite's reliability, and rXI and rX 2 are the reliabilities for the obtained standard scores. When the gram. The results show a composite standard score of weights are set to 1.00, Equation 6 becomes 70.32 with a reliability of.95 and correlation with the IQ score of .70. The achievement composite score can then r. + r, + 2r12 r = I 2 (7) be used to determine if a severe discrepancy exists be­ x, 2 + 2 rl2 tween IQ and achievement. Table 1 provides several examples ofcomposite-score With weights, scores, means, standard deviations, and program results. The obtained scores of Table 1 have a reliabilities held constant, it is evident from Equations 4, mean of 100, standard deviation of 15, and identical 5, and 7 that, as the correlation between measures de­ weights. The first group offour rows contains four pairs creases toward zero, the composite standard score devi­ ofmatching scores with high reliability and correlation ates more from the mean and the reliability approaches for each pair. The result is a composite standard score the average reliability of the measures. As the correla­ that hardly differs from the obtained standard scores and tion increases, the composite standard score deviates less has only a small increase in reliability. The second group from the mean and composite reliability increases. ofscores shows the same scores with the same high reli­ Calculating the Composite's Correlation ability but only a moderate correlation. The composite With a Criterion standard scores are lower than those of the first group, demonstrating the tendency for composite standard scores In addition to increased reliability, a composite score to move farther away from the average of the obtained may also provide higher correlation with a criterion scores as the correlation decreases. The composite reli­ score and as a result better prediction. The composite ability for the second group is lower than that for the correlation coefficient is: first, although to two decimal places they are the same. rl3 r23 _ bl +b2 (8) rxy c - I ' \jb~+bi +2blb2r12 Table 1 Composite Scores and Reliabilities Based on where rxy c is the correlation coefficient between the com­ Two EquaUy Weighted Standard Scores posite and criterion score, rl3 is the correlation coeffi­ cient between the scores ofone test and the criterion, and Obtained Scores Composite Scores r23 is the correlation between the scores ofa second test Standard Standard and the criterion. Scores Reliability* Correlation Score Reliability 100, 100 .95, .95 .90 100 .97 85, 85 .95, .95 .90 85 .97 SAMPLE CALCULATIONS OF 70, 70 .95, .95 .90 69 .97 COMPOSITE SCORES 55, 55 .95, .95 .90 54 .97 100, 100 .95, .95 .75 100 .97 As part of a psychoeducational evaluation, a school 85, 85 .95, .95 .75 84 .97 psychologist administers a general achievement test and 70, 70 .95, .95 .75 68 .97 a reading test. Both tests yield several scores, including 55, 55 .95, .95 .75 52 .97 reading comprehension scores with a mean of 100 and 100, 100 .80, .80 .75 100 .89 85, 85 .80, .80 .75 84 .89 standard deviation of 15. Say a student received a stan­ 70, 70 .80, .80 .75 68 .89 dard score o£15 for the general achievement test's reading­ 55, 55 .80, .80 .75 52 .89 comprehension subtest that was based primarily on items 100, 100 .90, .90 .40 100 .93 measuring literal comprehension ofwords, phrases, and 85, 85 .90, .90 .40 82 .93 short sentences. The standard score of69 from the reading­ 70, 70 .90, .90 .40 64 .93 55, 55 comprehension portion ofthe reading test included items .90, .90 .40 46 .93 100, 100 .70, .70 .40 100 .79 measuring inferential comprehension, reorganization, 85, 85 .70, .70 .40 82 .79 and critical reading for paragraphs and short passages. A 70, 70 .70, .70 .40 64 .79 composite reading comprehension score is desired be­ 55, 55 .70, .70 .40 46 .79 cause of the complementary test items and comprehen­ 100, 80 .90, .85 .70 89 .93 siveness ofa combined assessment. Both scores are given 85, 70 .85, .85 .60 75 .91 a weight of1.00. The respective test manuals indicate an 60, 70 .88, .92 .75 63 .94 115, 125 .90, .90 .70 122 .94 internal consistency reliability coefficient for the score 115, 90 .90, .85 .67 103 .93 of75 as .90 and for 69 as .93. Both test manuals also in­ Note-For illustrative purposes, all obtained scores have a mean of I00 dicate correlations of .65 and .68, respectively, with the and a standard deviation of 15. *Internal consistency reliability coef- intelligence test given to the student. A concurrent valid- ficients are shown. 212 EVANS

The third group ofobtained scores is the same as the ofTable 1, it is clear that composites based on more than second except for lower reliability values. Because reli­ two scores may not offer appreciable increases in relia­ ability does not directly affect the composite standard bility except in cases where no measures with high reli­ score, these scores can be taken as the same as those for ability exist and the correlation between measures is at the second group. Ofthe same three groups, however, the least moderate. third group's composite reliability shows a greater ab­ In what circumstances would the composite-score pro­ solute increase over its obtained scores. This is primar­ gram be most beneficial? From Table I it appears that ily due to the fact that it had greater room for improve­ composite scores may offer the most benefits in two pri­ ment than did the second group, but also to the fact that mary circumstances. The first occurs when a highly reli­ its correlations were closer to its reliabilities. That is, as able test does not exist to assess a particular domain but the upper bound for the correlation is the lower ofthe re­ two or more moderately reliable measures ofmoderate cor­ liabilities, the composite reliability increases as the cor­ relation do exist. The resulting increase in composite reli­ relation increases (Equation 7) and reaches its upper limit ability may overcome the absence ofa highly reliable in­ as the correlation approaches the lower ofthe reliability strument. The second circumstance occurs when two or coefficients. more tests complement each other and a composite is The fourth group ofscores shown in Table 1 has high needed to obtain a comprehensive score. Such a circum­ reliabilities but lower correlations than the other groups. stance may occur, for example, when tests assess different As expected, the composite standard scores are farther areas of the same domain, raters measure the same domain, away from the average ofthe obtained scores, and the com­ or the same information is gathered at two different times. posite's increase in reliability over the obtained scores is Previous discussion has based weights upon a test fea­ small, as the correlation is closer to zero than the lower ture (e.g., reliability) or a desired contribution to the reliability value. When the reliabilities decrease and the composite score. Such weighting methods may not be correlation remains the same (as in, e.g., the fifth group desirable when scores are correlated. In such cases, it is ofscores), the composite standard scores predictably re­ possible to calculate optimal weights such as those used main the same, but the composite reliability shows a in multiple regression equations. Govindarajulu (1988) greater relative, though smaller proportional, increase. presents a comparison ofthe equations and outcomes of The first five groups ofscores had identical scores of optimal and other weighting methods. Optimal weights matching reliabilities, a convenience selected to allow provide the highest correlation between the composite only reliabilities or correlations to vary. The sixth group standard score and a criterion. Gains in correlation, reli­ ofscores shows the more common outcome ofdifferences ability, and so forth, from derived weights may offer only in obtained scores and reliabilities. Although less evi­ negligible improvement over equal weighting (Guilford, dent than with the first five groups, the same tendencies 1954), resulting in a need to match any weighting method are found-the deviation ofthe composite standard score with the composite's role in the assessment and to select and reliability from the average ofthe obtained scores is weights on an a priori basis to avoid data manipulation. directly related to the size of the correlation between Several limitations to composite scores exist. The equa­ scores. tions and discussion presented have assumed linear, rather Many practitioners expect that when two obtained than curvilinear, relationships between scores. It also scores are identical, the composite score will equal the would be inappropriate to include a score from a spoiled obtained scores. As Table I shows, this is true only when or suspect test, even if its contribution was minimized with the obtained scores equal the mean (or for the improba­ a relatively small weight. Likewise, differences in scores ble case oftwo perfectly reliable and correlated scores). due to time ofassessment, evaluators, normative data, or Otherwise, the composite score is more extreme, that is, domain should be minimized unless it is the composite's farther from the mean, than the average ofthe obtained function and is reflected in the correlation between scores. scores. Composite scores are not limited to normally distributed standard scores despite the previous emphasis (some DISCUSSION weighting methods may assume normally distributed scores). Indeed, composite score statistics are often pre­ This paper describes a program used to derive a com­ sented in the context ofstandardized student scores from posite standard score based on two standard scores. The classroom tests (e.g., Thorndike et aI., 1991). equations of the program are presented to enhance un­ Practical implications ofcomposite scores are primar­ derstanding and application. For the equations showing ily related to the integration oftest results, improved re­ the general case for determining composite standard liability and validity, and the resulting impact on decision scores, reliabilities, and correlations, the reader is referred making. The primary value ofcomposite scores may lie to Dunteman (1984) and Mosier (1943). Equation com­ in increasing the technology of score interpretation and plexity grows considerably when more than two scores providing practitioners with an additional tool to test hy­ are used to derive a composite. For example, the com­ potheses and obtain a clearer picture of subjects' abili­ posite's standard deviation alone includes 8 terms for a ties. Therefore, future research with standard score com­ two-score composite, 18 for a three-score composite, posites may best be able to investigate the outcomes of and 36 for a four-score composite. From the discussion using composite scores versus individual scores. In addi- COMPOSITE SCORES 213

tion to impact on decision making, the research could in­ SALVIA, J., & YSSELDYKE, J. E. (1991). Assessment (5th ed.). Boston: clude differential improvements in agreement (e.g., in­ Houghton Mifflin. terrater, classification, stability, interdomain, interinter­ SPARROW, S. S., BALLA, D. A., & CICCHETTI, D. V. (1984). Vineland Adaptive Behavior Scales: Survey Form. Circle Pines, MN: Ameri­ viewer) with a criterion when composite scores are used. can Guidance. THORNDIKE, R. M., CUNNINGHAM, G. K., THORNDIKE, R. L., & HAGEN, REFERENCES E. P.(1991). Measurement and evaluation in psychology and educa­ tion (5th ed.). New York: Macmillan. ARKANSAS DEPARTMENT OF EDUCATION (1993). Program standards and WANG, M. w., & STANLEY, J. C. (1970). Differential weighting: A re­ eligibility criteria for special education. Little Rock, AR: Author. view of methods and empirical studies. Review ofEducational Re­ DUNTEMAN, G. H. (1984). Introduction to multivariate analysis. Bev­ search, 40, 663-705. erly Hills, CA: Sage. WECHSLER, D. (1991). Wechsler Intelligence Scale for Children-Third EVANS, L. D. (1994). Standard score regression comparison 3.0 [Com­ Edition. San Antonio, TX: Psychological Corporation. puter software]. North Little Rock, AR: WtL. WOODCOCK, R. W.,& JOHNSON, M. B.(1989). Woodcock-Johnson Psycho­ GOVINDARAJULU, Z. (1988). Alternativemethods for combining several educational Battery-Revised. Allen, TX: DLM. test scores. Educational & Psychological Measurement, 48, 53-60. GUILFORD, 1. P. (1954).Psychometric methods. NewYork: McGraw-Hill. MOSIER, C. I. (1943). On the rehability ofa weighted composite. Psy­ chometrika, 8,161-168. PAYNE, R. w., & JONES, H. G. (1957). Statistics for the investigation of (Manuscript received November 20, 1995; individual cases. Journal ofClinical Psychology, 13, 115-121. accepted for publication January 3, 1996.)