Based on Some Componential Cognitive Paradigms William P
Total Page:16
File Type:pdf, Size:1020Kb
Problems With Individual Difference Measures Based on Some Componential Cognitive Paradigms William P. Dunlap, Tulane University Robert S. Kennedy, Essex Corporation Mary M. Harbeson, Tulane University Jennifer E. Fowlkes, Essex Corporation This article demonstrates that slope and ratio scores measures such as slope, difference, or ratio scores. may have the same psychometric difficulties—low re- This approach appears promising, as aspects of difference scores. liability—as Empirically, direct human cognitive performance exist that are not ad- measures and derived scores from Baron’s, Collins’, equately measured by traditional IQ and ability tests Meyer’s, and Posner’s cognitive paradigms were ex- amined in terms of their reliabilities and cross-correla- (Carroll, 1976). tions. Reliabilities of the direct measures and their in- This research stemmed from three basic obser- tercorrelations were high. The derived measures, vations concerning the constructs developed in some which were slope, ratio, and difference scores, had re- cognitive paradigms: liabilities near zero their cross-correla- and, therefore, 1. Although existing cognitive paradigms have tions were also low. It is concluded that derived intuitive because are scores, although intuitively appealing as measures of appeal they factorially mental operations, may have inherent psychometric rich (i.e., they measure a number of different difficulties that render them of little value for differen- mental constructs), the variance shared across tial prediction. Index terms: cognitive paradigms, various paradigms may be high, because a thread difference scores, individual differences, prediction, common to many of them involves contrasting ratio scores, reliability, slope scores. latency-based performance on simple versions of a task to performance on increasingly com- In recent years measures of individual differ- plex forms of the same task; ences based on componential cognitive theory have 2. Many derived scores in cognitive paradigms been developed to supplement or replace traditional are based on slopes, which may have the same psychometric measures frequently used for select- inherent statistical difficulties as difference and ing and training individuals (Carroll, 1976; Farr, scores (Carter, Krause, & Harbeson, 1984; Hunt, 1978, 1983, 1984; Office of Naval percent 1986); and Research, 1987; Stemberg, 1986). Most compo- 3. Ratio scores may have the same psychometric nential theories of cognition are variations of an difficulties as difference and slope scores, which information-processing model involving sequences cause their reliabilities to be low and ren- of mental operations (Carroll, 1988; Farr, 1984; may der them impotent as predictors. Wickens, 1984). Researchers often attempt to iso- Clearly, measures denoting structure in intelli- late these mental operations by computing derived gence function do not need to be good differential APPLIED PSYCHOLOGICAL MEASUREMENT predictors to provide useful descriptions of cog- Vol. No. 1, March 1989, 9-17 13, pp. nitive processes. The extent to which most persons © Copyright 1989 Applied Psychological Measurement Inc 0146-6216/89/010009-09$1.70 show the characteristic in question may be impor- 9 Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 10 tant, correct, and descriptive; however, if there is Bittner (1982), however, found that the difference not a large and reliable range of individual differ- score, which supposedly measures interference, is ences in the characteristic, it will suffer as a pre- not reliable. In addition, the difference score does dictor. The present study was concerned with the not appear to measure a construct separable from use of the derived scores for predictive purposes. the scores from which it was derived. The use of derived cognitive scores as individual In both examples, although differences exist in difference measures is typified by the work of Rose group performance between a simple and a more and colleagues (Rose, 1978; Rose & Fernandes, complex condition, individual difference scores are 1977), who developed an information-processing not reliable. In each case, the basic scores were performance battery to be used as a selection tool. reliable and were as highly correlated with each When developing the battery, they attempted to other as was statistically possible. include information-processing tests with high re- Slope, ratio, and difference scores do not always liability, statistical independence, and construct va- have reliabilities so low that they would incapa- lidity. Each information-processing task was sup- citate derived scores as predictors. For examples posed to involve at least one of the following of derived scores with reasonably useful reliabili- operations in addition to encoding and responding: ties, see Jensen (1965) regarding derived scores constructing, transforming, storing, retrieving, from the Stroop (1935) phenomenon, or Rose (1974) searching, and comparing. Different information- and Rose and Fernandes (1977) for derived scores processing tests were assumed to tap different men- from other paradigms. However, given the follow- tal operations, and individual differences on dif- ing derivation and demonstration, it behooves re- ferent cognitive tasks were assumed to indicate searchers to address the reliability of derived scores relative skill in each operation. before recommending them as predictors. From a differential prediction standpoint, the question is not whether mental operations or in- Reliability formation-processing components exist, but whether the individual differences are sufficient in these Practice effects. Evaluating performance based constructs to make them useful predictors. Mental on tests that are not stable with repeated testing operations intuitively appear to take place in stages creates at least two related dangers. First, con- that take a measurable amount of time. The latency structs that change over time are unstable, and pre- of certain processes can be demonstrated (Brown, dicting from measures of that construct will be 1958; Peterson & Peterson, 1959; Sperling, 1960). compromised. Second, measurements of nonstable The fact that mental events take a finite amount of constructs can be misconstrued as unreliable. An time, however, does not necessarily mean that there adequate assessment of reliability can only be per- are reliable indicators of individual differences in formed on a stabilized variable. Because reliability those events. defines the upper limit to validity, it is essential to For example, it has been known since the time establish reliabilities of putative measures of in- of Donders (1868/1969; Smith, 1968) that reaction dividual differences of cognitive ability prior to time for four choices takes longer than simple re- employing such scores in prediction or selection. action time. However, the reaction time slope, which Difference scores. It has long been recognized purportedly measures speed of complex informa- that difference scores frequently have low relia- tion processing, has long been known to be unre- bility (Cronbach & Furby, 1970). Whenever the liable. Another example is the Stroop (1935) phe- correlation between different tasks is near the re- nomenon. It virtually always takes longer to name liabilities of those tasks, difference scores will be the color of words printed in conflicting colors than severely limited in reliability. This can be seen it does to read the names of the colors printed in from the following equation for reliability of dif- black and white. Harbeson, Krause, Kennedy, and ference scores (Cohen & Cohen, 1975, p. 64): Downloaded from the Digital Conservancy at the University of Minnesota, http://purl.umn.edu/93227. May be reproduced with no cost by students and faculty for academic use. Non-academic reproduction requires payment of royalties through the Copyright Clearance Center, http://www.copyright.com/ 11 ence scores and ratio scores share the common statistical difficulty of low reliability whenever rab approaches the average of r_ and rbb. The goal of the present research was to study where r_ and r66 are the reliabilities of Task A the reliability of derived measures from four cog- and Task B, and rab is the correlation. nitive paradigms. These paradigms were: gra- scores. Measures based on Slope slope scores, phemic and phonemic analysis (Baron, 1973; Baron which appear with increasing frequency in cogni- & McKillop, 1975); semantic memory retrieval tive research have demonstrable inher- paradigms, (Collins & Quillian, 1969); lexical decision making ent statistical weaknesses. studies have Empirical (Meyer, Schvaneveldt, & Ruddy, 1974); and letter shown that a number of measures have low slope classification (Posner & Mitchell, 1967). reliabilities (Carter et al., 1986), which indicates The paradigms studied used a common method, that slope scores may have lower reliabilities- reaction time to letters or words; therefore, the raw often near zero-than the scores from which they scores should share common variance. The derived are derived. This occurs because, mathematically, scores were either differences, slopes, or ratios; slope scores can be recognized as either difference the question addressed was whether such