CHAPTER 20

TESTING AND INDUSTRIAL APPLICATION

Robert D. Gatewood Robert Perloff Evelyn Perloff

INTRODUCTION The purpose of this chapter is to discuss assess- ment of these five types of characteristics. In doing To a large extent performance of an organization is this, we will describe the major types of instru- dependent upon performance of its individual ments that are used for these assessments and sum- members. Because individual performance is a marize the research as to their use within function of ability, motivation, and situational fac- organizations. Because of equal employment tors (Borman, 1991), the many organizational pro- opportunity laws, significant differences in the grams that affect any of these three variables (e.g., scores of demographic groups on these assessment selection, job design, compensation, training, per- instruments may be justification for a legal review formance measurement) are important to the over- of the assessment procedures. For this reason we all well-being of the organization. They, therefore, will also comment on the legal implications of must be designed and implemented appropriately. using these devices. As an introduction we will Because each of these organizational programs use briefly describe the five types of characteristics data about characteristics of employees, a key component of their appropriate design and imple- and some of the organizational programs for which mentation is the assessment of individuals. For the data about these characteristics are used. example, pay-for-performance compensation sys- Knowledge, skills, and abilities (KSAs) refer, tems are based upon assessment of individual/ respectively, to the amount of factual information group work performance. While these organiza- known by an individual, his or her conduct of job- tional programs have used data from the assess- specific activities such as the operation of a partic- ment of numerous characteristics of individuals, ular piece of machinery, and his or her conduct of measurement of five types of characteristics has generalized job activities such as statistical analy- been dominant in these programs: (a) knowledge, sis and verbal expression. These characteristics skills, and abilities; (b) personality; (c) physiologi- have been demonstrated to be strongly related to cal attributes; (d) attitudes; and (e)job perfor- the job performance of employees (Gatewood & mance. That is, data from at least one of these five Feild, 1998). Therefore, assessment of KSAs is types of characteristics are used in nearly every essential to programs such as recruitment, selec- program which is intended to increase employee tion, job design, training, career development, and performance. some skill-based compensation.

505 506 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

Personality, which has been defined and dis- Achievement and Aptitude Tests cussed in previous chapters, refers to characteris- tics such as thoughts, feelings, and behaviors that Achievement and aptitude tests are paper-and- combine distinctly in each individual (Allport, pencil, usually group administered tests that emphasize factual information and its use in prob- 1961). Personality data have been included in lem solving. The distinction between achievement several organizational programs. For example, and aptitude is made primarily on the basis of the these data have been used in selection since the use of test results (Anastasi, 1982). Achievement development of the Army Beta test during World test results are used to assess present levels of War I. Personality data have also become knowledge, while aptitude tests are used to predict increasingly important in the implementation of future performance of individuals. In reality, the job design and work process programs, such as two types of tests are almost identical in terms of autonomous work teams and employee involve- content. ment. Because these two, and other closely related, programs emphasize ongoing work inter- actions among team members, it is commonly Cognitive Tests thought that there must be compatibility among team members in several characteristics, includ- One of the most extensively used achievement/ ing personality, in order for a team to work aptitude measures within organizations involves effectively. Additionally, personality data have cognitive ability tests, which are discussed in other also been used in leadership training and career chapters of this text as tests. Tests of counseling. this type usually tap a variety of characteristics, such as memory span, numerical fluency, general Assessments of physiological attributes, such reasoning, verbal comprehension, spatial orienta- as hearing, vision, strength, and coordination tion, perceptual relations, and logical evaluation. have been used for both selection and design of One popular example of this type of test is The work equipment and processes. Work attitudes, Wonderlic Personnel Test which is a 12-minute, such as job satisfaction and organizational com- multiple-choice instrument that consists of 50 mitment, are employees' cognitive and/or affec- items and has been printed in at least 14 different tive responses to aspects of the work forms. Although cognitive ability tests have been environment (Hulin, 1991). Therefore, attitude used in selection for over 80 years, frequency of data have been obtained in order to measure their use has diminished since the early 1970s as employees' reactions to a wide variety of organi- the result of Supreme Court decisions regarding zational programs, such as promotion opportuni- discrimination in selection. In widely publicized ties, compensation, benefits, and task activities. cases, courts found that defendant organizations, Also, measured change in attitudes has been the some of whom used the Wonderlic as a standard subject of several types of training programs, part of their selection programs, were guilty of race especially those that address the management of discrimination. The basis for these cases was that a culturally diverse workforce. Finally, job per- the use of cognitive ability tests inevitably results formance data, which are measures of various in "adverse impact," that is, a significantly larger aspects of work activities or output, are exten- percentage of minority applicants being rejected sively used in selection, promotion, training, and for employment than nonminority applicants. compensation programs. ("Adverse impact" upon a group, usually a minor- ity group, occurs in selection when the selection rate for applicants of any one demographic group is less than 80% of the selection rate of the demo- ASSESSMENT OF KNOWLEDGE, graphic group with the highest selection rate. For SKILLS, AND ABILITIES example, if 30 of 50 [60%] male applicants are selected and if the applicant pool for females is 40, Achievement and aptitude tests, application adverse impact would occur if fewer than 19 forms, interviews, and performance tests are the females are selected [40 x .60 x .80 = 19.2].) In assessment devices which have been used the most these early court cases, the organizations using the often to measure KSAs. cognitive ability tests did not have adequate evi- TESTING AND INDUSTRIAL APPLICATION 507

Table 20.1. Validity Generalization Coefficients of Cognitive Ability and Job Performance for Selected Jobs JOB TEST TYPE CORRECTED VALIDITY COEFFICIENT Computer programmer a Figure analogies .46 Arith metic reasoning .57 Total score all tests .73 First-line supervisor b General mental ability .64 Mechanical comprehension .48 Spatial ability .43 Computing and account-recording clerks c General mental ability .49 Verbal ability .41 Quantitative ability .52 Reasoning ability .63 Perceptual speed .50 Memory .42 Operator (petroleum industry) d General intelligence .26 Arithmetic reasoning .26 Police and detectives e Quantitative ability .26 Reasoning .17 Spatial/mechanical ability .17 Notes: aSchmidt,Gast-Rosenbery, & Hunter (1980); bSchmidt, Hunter, Pearlman & Shane (1979)" Cpearlman, Schmidt, & Hunter (1980)" dSchmidt, Hunter, & Caplan (1981); e Hirsh, Northrup, & Schmidt (1986). dence to support their use in selection and, subse- are hampered by a number of methodological lim- quently, failed to justify the adverse impact that itations that artificially affect the resulting validity resulted. coefficient. Among the most serious of these meth- However, more recent research on cognitive odological limitations are small sample size, unre- ability tests has consistently demonstrated that liability of the predictor and criterion measures, these tests are empirically related to job perfor- and restriction in range on both measures. Validity mance and, therefore, should be legally defensible generalization studies correct each of these meth- in selection programs. One example of such odological limitations in each of a group of previ- research is Project A, which was a seven-year ously conducted individual-company validation study to develop a selection system appropriate for studies which have used the same predictor and all entry-level positions in the U.S. Army (Camp- criterion measures. After these corrections, a cor- bell, 1990). Project A collected data from over rected validity coefficient is calculated which com- 4,000 incumbents and developed six domains of bines data across all of the previous studies. Table predictor instruments. Among these domains were 20.1 contains results of validity generalization general cognitive ability and spatial ability. Data studies that have been conducted for cognitive analyses indicated that these two cognitive ability ability tests for specific jobs. As can be seen, sev- domains were more strongly related to job perfor- eral different cognitive ability tests have been mance measures than were the tests in the other found to be significantly related to performance four domains. General cognitive ability correlated with corrected validity coefficients ranging from .63 and .65 with the two performance measures .30 to .64, the latter being quite high for a single that most directly measured task performance, selection measure. (e.g., core technical proficiency and general task A second type of validity generalization study proficiency). Spacial ability correlated .56 and .63 has been conducted by combining data from stud- with the same two measures. ies of different jobs within the same occupation. Other evidence of the validity of cognitive abil- Hunter (1986) examined the validity of cognitive ity measures has been produced by a series of ability for nine occupations. He determined cor- validity generalization studies. The principle of rected validity coefficients for cognitive abilities to validity generalization is that individual-company be .61 for salespersons, .54 for clericals, .53 for studies that have computed the correlation between managers, and .48 for service workers. Corrected measures of cognitive ability and job performance coefficients for the remaining five occupations 508 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

Table 20.2. Validity Generalization Coefficients as a Function of Job Complexity JOB FAMILIES VALIDITY PERFORMANCE VALIDITY TRAINING General job families high complexity .58 .50 medium complexity .51 .57 low complexity .40 .54 Note: Hunter (1986, pp. 340-362)

ranged from .28 to .46. These results indicate that was drawn for each group. The two curves were cognitive ability is related to job performance for almost identical, meaning that the cognitive ability dissimilar jobs within an occupation and also for tests acted in the same manner for both African- many occupations. American and white groups. These findings and A third type of validity generalization study those of related studies have led to the conclusion which further emphasized the validity of cognitive that cognitive ability tests have minimal cultural ability tests combined jobs by complexity level. bias (that is, little, if any, non-job-performance Complexity level is a very broad grouping variable related differences) and can be validity used for which measures the amount of decision making selection across demographic groups. and information processing required by the job. A related study compared the use of 18 types Results, presented in Table 20.2, found that both of predictor instruments for selection (Reilly & job performance and performance in training pro- Warech, 1991) on the basis of validity, adverse grams were strongly related to cognitive ability. impact, feasibility (cost of development and use), Moreover, the magnitude of the validity coefficient and fairness to the applicant. Results determined (Validity Performance in Table 20.2) increased as that cognitive ability tests were ranked very job complexity increased. The most likely explana- highly among the 18 test types in both validity tion of these findings is that all jobs involve some and adverse impact (differences in test scores). information processing and problem-solving com- Additionally, large differences were demon- ponents. These components increase as jobs and strated among the test types in feasibility. Gener- occupations become more complex. Because cog- ally, cognitive ability tests were more feasible to nitive ability tests are, in large part, measures of use than other types. Analyses of fairness to the information processing and problem solving, they applicant included evaluations of false rejection will be related to performance, especially in com- rates, perceived relevance by the applicant of test plex jobs and occupations. Use of cognitive ability material to the job, and potential of improve- tests for selection is expected to increase in the ment for employability for the rejected appli- future as assembly-type manufacturing jobs cant. Cognitive ability tests performed well on decrease and service and technology jobs increase. the first two criteria and poorly on the third. As mentioned previously, a critical issue with Overall, this study concluded that cognitive abil- use of cognitive ability tests in organizations is the ity tests can be recommended because of their adverse impact which accompanies these tests. At validity and low cost. However, other tests do one time it was thought that differences in mean demonstrate less adverse impact and are more scores between minority and nonminority demo- fair to applicants, although each has major defi- graphic groups on these tests were a function of ciencies in terms of cost and ease of develop- "cultural bias" (in terms of test content which was ment and use. (An important point should be unrelated to job performance). However, research made here with respect to adverse impact. As has found that differences in test performance are, mentioned previously, this term signifies test in fact, consistently related to differences in job score differences between demographic groups. performance among individuals. Cognitive ability However, these test score differences are legally tests are equally valid for almost all demographic justifiable if they are related to differences in job groups. For example, one study examined 781 performance. That is, if test differences and job pairs of validity coefficients for African-American performance differences occur in the same pat- and white groups (Hunter, Schmidt, & Hunter, tern, the test can legally be used for selection. 1979). A graph of the pattern of these coefficients This is the case in the use of cognitive ability TESTING AND INDUSTRIAL APPLICATION 509 tests in selection. They cause adverse impact, but Mechanical ability tests have proven to be valid this adverse impact is related to differences in in selection for several specific jobs. For example, job performance.) Ghiselli (1973), as part of a larger study, reviewed the use of spatial and mechanical tests as well as motor abilities' tests in selection for eight occupa- Mechanical Ability Tests tions. Spatial and mechanical tests were valid for use with managerial, service, vehicle, trades and A second widely used aptitude/achievement crafts, and industrial occupations. Motor ability instrument includes mechanical tests, which mea- tests were useful for service, vehicle, trades and sure either (a) knowledge about or skill in using crafts, and industrial occupations. Similarly, tools, machines, and electrical equipment; or (b) researchers in the Project A study found that per- verbal and mathematical ability to follow direc- ceptual-psychomotor ability correlated .53 with tions or make calculations concerning the use of core technical proficiency and .57 with general mechanical tools. Within this general description, task proficiency. This type of test is also useful for a wide variety of tests is available. These tests can diagnosing deficiencies in workers in their knowl- have a broad array of topics (general mechanical edge of mechanical principles and techniques. ability) or be narrowly focused on one specific topic (e.g., electronics or welding). Clerical Ability Tests An often-used general mechanical test is the Bennet Mechanical Comprehension Test (Gate- wood & Feild, 1998). The items of this test contain Traditionally, clerical jobs have been designed objects that are almost universally familiar in with a large number of tasks which focused on bookkeeping, typing, filing, and recordkeeping. To American culture: airplanes, carts, steps, pulleys, a large extent, these tasks require the employee to seesaws, gears, and so on. The questions measure extensively check or copy words and numbers and the respondent' s ability to perceive and understand to create and maintain orderly systems for the the relationship of physical forces and mechanical keeping of files and reports. Therefore, clerical elements in practical situations. There have been tests have predominantly measured perceptual six different forms of this test plus a Spanish-lan- speed and accuracy in the processing of verbal and guage version. Each form has approximately 68 numeric data. Perhaps the most widely used cleri- items and no time limit for administration. cal test is the Minnesota Clerical Test, which has Reported reliabilities, in the .80s, are quite favor- two separately timed and scored subtests: number able. Studies have correlated scores on the Bennett checking and name checking. Each subtest has 200 with other ability tests and have found correlations items, consisting of a pair of numbers or a pair of with verbal and mathematical ability and spatial names. The respondent is to compare the pair and visualization. Another frequently used general place a check on a line between the two entries of mechanical ability test is the MacQuarrie Test for the pair if these two entries are identical. Entries in Mechanical Ability. This is also a paper-and-pencil the numbers subtest range from three through 12 test that requires about 30 minutes to administer. It digits, while the entries in the names subtest range contains seven subtests: tracing, tapping, dotting, from seven through 16 letters. The score on the copying, location, blocks, and pursuit. Minnesota Clerical Test is the number right minus Examples of specific-topic tests are the Purdue the number wrong. Reliability has been estimated Trade Tests, which include the Test for Electri- at .90 for parallel forms and .85 for test-retest. cians, Trade Information Test in Welding, and Studies have demonstrated satisfactory validity Trade Information Test in Engine Lathe Opera- coefficients for clerical, managerial, protective, tion. Each one is a multiple-choice test of technical trades and crafts, and industrial occupations (Ghis- knowledge about tools and operations in a specific elli, 1973). subject matter. Another type of mechanical ability test involves performance tests, such as The Hand- Tool Dexterity Test which measures manipulative Application Forms skill with materials and equipment which is impor- tant in factory jobs and industrial apprentice train- Application-form information is frequently ing. used by organizations in the initial step of selec- 51 0 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT tion programs to make general assessments of Biographical Data forms are a third type of job-related KSAs. However, because many appli- application assessment. These forms usually con- cation forms ask superficial information about tain several hundred multiple-choice items which education, previous work history, and avoca- ask about the applicant's previous life experiences tional activities, they have not demonstrated ade- involving social, educational, work, and family quate validity in assessing KSAs. Therefore, interactions. Scoring keys are developed through research has identified specialized assessment multivariate statistical analyses using factor analy- forms that may be used in this initial stage of ses, clustering, and discriminant function analysis. selection. One such form is Training and Experi- The result is the specification of a scoring key in ence (T&E) Evaluations (Ash, Johnson, Levine, which applicants are selected on the basis of the & McDaniel, 1989). Although there are several similarity between their self-reported previous life different types of T&E forms, each type usually experiences and those of successful employees of contains statements of specific, important job the organization. Biographical data inventories are tasks which have been identified through job written based upon hypotheses as to which life analysis. The applicant is asked to respond to experiences may be theoretically related to success each of these task statements by indicating his or in the activities of the job of interest. Items are her work experiences and/or training which relate most often written on the following topics: (a) hab- to these tasks. In addition, the applicant must its and attitudes, (b) health, (c) human relation- provide names of persons who can be contacted ships, (d) financial characteristics, (e) parental to verify this experience or training. This infor- home, childhood, teen years, (f) personal mation is then graded by an organizational spe- attributes, (g) present home, spouse, and children, cialist using a scoring key. Such a scoring key (h) recreation, hobbies, and interests, (I) school usually takes into account the importance of each and education, (j) self impressions, (k) values, task for overall job performance, the extensive- opinions, and preferences, and (1) work. ness of previous experience and training, and the All three of these assessment devices have dem- depth of reported knowledge. onstrated acceptable reliability (usually above .80) The Weighted Application Blank (WAB) is a and validity. For example, different studies have second type of useful application form (England, determined validity coefficients of .45 for T&E 1971). A WAB is actually a technique for scoring forms (McDaniel, Schmidt, & Hunter, 1988) and. application forms rather than a separate assessment 37 for Biographical Data (Reilly & Chao, 1982). form. This procedure is composed of the following Brown's (1978) study found that a WAB used in steps: (a) choosing a criterion, (e.g., job tenure); the insurance industry was able to predict both pro- (b) drawing large samples of high- and low-crite- duction and tenure during a 45-year time period. rion groups from the organization's employment files (e.g., individuals with more than one year of tenure and individuals who terminated employ- Selection Interview ment before one year); (c) selecting application- blank items (writing items based on application The interview has been one of the most often information which can be expressed as questions used selection devices, because it can be applied to with multiple-choice response alternatives); (d) applicants of all job groups. However, periodic scoring the application blank of each individual in reviews of the use of this device have found that the sample on the items selected in the previous the interview frequently is characterized by limited step; (e) determining item weights (using statistical reliability and validity in the assessment of KSAs analyses to identify those items for which high-cri- (Schmitt, 1976). These deficiencies have, in turn, terion respondents answered differently from low- led to extensive research which has tried to deter- criterion respondents); (e) applying weighted items mine which factors have impeded both the inter- to another sample used as a holdout group. These view process and the interviewer's decision steps result in a scoring key which can be applied making (Dipboye, 1992). to the application forms of future applicants with One source of difficulty with the interview has the resulting score indicating whether or not the been its use in assessing a wide variety of charac- applicant should be invited back for further teristics of applicants (Gatewood & Feild, 1998). employment assessment (Mumford & Owens, Among these characteristics are job knowledge, 1987). personality traits, future work motivation, adjust- TESTING AND INDUSTRIAL APPLICATION 51 1

ment to incumbent workers, verbal ability, and As a result of the previously mentioned research career development potential. While such a diver- and court decisions, there have been several sug- sity of characteristics is theoretically possible to gestions to ensure that the interview is both reliable measure, reviews have agreed that there is evi- and valid. One of the most effective of these sug- dence to support the assessment of only a limited gestions has been to use behaviorally based inter- number of these. Ulrich and Trumbo (1965) con- view questions. One technique for developing such cluded that "... the interviewer is all too frequently questions is the situational interview. The intent of asked to do the impossible because of limitations this technique is to identify specific activities on the time, information, or both .... When the task which represent important job tasks and use this was limited ... acceptable validity was achieved" information to form questions that ask an applicant (p. 114). Three main types of KSAs have been how he or she would behave in the situation identified as appropriately measured in the inter- (Latham, Saari, Pursell, & Campion, 1980). Vari- view: job knowledge, personal relations (sociabil- ous studies of the situational interview have ity, verbal fluency, conflict resolution, etc.), and reported validity coefficients ranging from .30 to work habits (dependability, stability of perfor- .46 and interrater reliability estimates between .76 mance of tasks, ability to coordinate simultaneous and .87 (Harris, 1989). A second technique is the projects, etc.) (Schmitt, 1976). Behavioral Descriptive Interview (Janz, 1982). Several factors, which are only peripherally This type of interview is similar to the situational related to the KSAs being assessed, have been interview in its identification of job activities. shown to influence the evaluation of the inter- However, resulting questions are different in that viewer. Physical attractiveness of the applicant and they ask about general performance behaviors personal liking of the applicant by the interviewer rather than specific job situations. They also con- are two such factors (Keenan, 1977). A variety of tain probe questions. Validity coefficients of .54 nonverbal behaviors of the applicant, such as eye and .48 have been reported for the Behavioral contact, head movement, smiling, hand motions, Descriptive Interview (Harris, 1989). and general body posture has also been identified (Dipboye, 1992). The disproportionate influence A second, related suggestion for improvement of any negative information on the overall evalua- has been to structure the interview. This consists of tion has also been found (Rowe, 1963). Related to such steps as training all interviewers in a particu- this is the finding that the interviewer frequently lar style of interviewing, identifying KSAs to be makes an overall assessment of the applicant evaluated in each interview, specifying the main within the first few minutes of the interview questions to be asked of all applicants, and devel- (Ulrich & Trumbo, 1965). In addition, the contrast oping a formal scoring system and decision rules. effect, associated with previous applicants Meta-analytic studies have determined corrected (Valenzi & Andrews, 1973), is among many infor- validity coefficients of .62, .47, and .49 for the mation-processing factors that have been studied structured interview, indicating that it is roughly (Dreher & Sackett, 1983). equivalent to cognitive ability tests for selection Because of the frequency of use of the selection purposes (Gatewood & Feild, 1998). interview and its traditionally low reliability and validity, it has been the subject of several equal- opportunity court cases. These cases have yielded Performance Tests opinions about specific features of the interview which are presumed to be sources of adverse Performance tests are assessment devices that impact: having all male and/or white interviewers, present testing situations that closely resemble not using a structured or written interview format, actual job tasks and require the individual to com- not having stated criteria for employment deci- plete some activity under structured testing condi- sions, and not using uniformity in applying selec- tions. In addition to their use in selection, these tion criteria (Ledvinka & Scarpello, 1991). Also, tests have also been extensively used in training in research has determined that females are often order to diagnose an individual's present KSAs given lower evaluations than comparable males and identify any deficiencies. The two most often when the jobs of interest are those frequently used types of performance tests are work samples thought of as "male" jobs (Haetner, 1977). and assessment centers. 512 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

Work Sample Tests to as dimensions in the AC. Several trained asses- sors observe the candidates' performance during Work sample tests have been used for the selec- each of these assessment devices, meet to discuss tion and training of clerical staff, technicians, and each applicant' s performance, and use these data to skilled crafts. Specific tests require such activities develop ratings for each dimension as well as an as typing, welding, wiring connections, cutting and overall rating of goodness of performance. nailing boards, and mixing chemicals. For exam- Two of the most often used simulations in the ple, Robinson (1981) describes a work sample test AC are the In-Basket and the Leaderless Group used for construction superintendents. An architect Discussion (LGD). The In-Basket is a written sim- was retained to identify common architectural ulation, of up to 30 memos, that is intended to rep- errors in blueprints and to incorporate these errors licate administrative tasks. In responding, the into the drawings of buildings which had actually assessee writes the specific actions that he or she been built by the company. Applicants were asked would take in reference to each memo, even nam- to review the blueprints and to mark the location of ing individuals to be involved in these actions. the errors with a felt-tipped pen on copies of the Typically, three hours are allotted for completion. drawings. This device is frequently used to assess KSAs of Evaluations of the use of work sample tests have planning, organizing, ability to delegate, decisive- been very positive and have identified a number of ness, independence, and initiative. The LGD is benefits. Gordon and Kleiman (1976) compared designed to represent those managerial behaviors correlations of work sample and paper-and-pencil that require the interaction of small groups of indi- ability tests with success in a police training course viduals to solve a problem. In the LGD, partici- using three different groups of subjects. For each pants are tested in groups of six and seated around group, correlations of the work sample test with a conference table. Assessors are seated at various success were superior to those correlations using places around the participants to observe and the ability tests. A meta-analytic study found a cor- record their behavior. The problem presented to rected validity coefficient of .54 between work the six-person group commonly is an allocation- samples and job performance (Hunter & Hunter, of-resource dilemma in which there are more 1984). Additionally, Schmidt, Greenthol, Hunter, demands for the resource than there is supply. Berner, and Seaton (1977) compared scores of Each participant is asked to play a specific role and minority and nonminority applicants who were is provided background information. The group is applying for positions as metal trade apprentices usually given 1.5 hours to complete the task. Com- on both written tests and work samples. A differ- monly, KSAs such as oral communication, toler- ence in scores between the two groups was identi- ance for stress, persuasiveness, and adaptability fied for only one of three work sample tests but for are measured. all five of the written tests. Finally, Cascio and A meta-analysis (a statistical analysis for com- Phillips (1979) found that work samples served as bining data across multiple samples) of ACs realistic job previews. That is, work sample tests reported corrected validity coefficients of .53 for provided a representative preview of the actual job, predicting career development and .36 for predict- including positive and negative aspects. As a ing immediate job performance (Gaugler, result, some applicants finding the work sample to Rosenthal, Thornton, & Bentson, 1987). These be unsuitable, removed themselves from the selec- results indicate that ACs measure KSAs that are tion program. The net effect was to reduce the turn- related to long-term success within organizations. over rate among new hires, resulting in large Another positive feature of ACs is their generally savings for the organization. favorable support by courts in alleged discrimina- tion cases. For example, in The Richmond Black Police Officers Association v. the City of Rich- Assessment Centers mond the use of an AC for selection for supervi- sory positions in both the police and the fire An assessment center (AC) is a standardized departments was upheld. Evidence indicated that evaluation of multiple behaviors using multiple while the paper-and-pencil tests which were used assessment devices, many of which must be simu- in selection had severe adverse impact, the AC had lations (Task Force on Assessment Center Guide- no such effect and compensated for the adverse lines, 1989). These multiple behaviors are referred impact of the written tests. TESTING AND INDUSTRIAL APPLICATION 51 3

PERSONALITY ASSESSMENT personality and job performance. First, there is a growing agreement among I/O psychologists and There are several indications that personality HR specialists that personality characteristics can should be a significant factor in job performance. be grouped into the broad personality dimensions First, job analysis has determined that many posi- referred to as the Big Five. In the past, researchers tions, such as receptionist, salesperson, and man- used a variety of personality traits in selection and, ager, require the incumbent to interact with others consequently, a variety of instruments, many of during most of the task activities. Other jobs, such which focused on narrow personality traits. Agree- as air traffic controller, law enforcement officer, ment on the use of the Big Five provides consis- and high school principal, require the incumbent to tency among researchers in terms of the constructs cope with many sources of stress. As a result, that are to be studied. Second, there is evidence many job analysis methods (e.g., the Position that studies which are methodologically sound Analysis Questionnaire) (McCormick & Jeanneret, often yield significant, uncorrected validity coeffi- 1988) include personality dimensions among the cients of .30 and higher. Such coefficients are gen- KSAs that are routinely identified. Second, empir- erally comparable to those of other, well accepted, ical studies have noted differences in personality selection instruments. Third, personality data have between successful and unsuccessful job perform- been found to be uncorrelated with cognitive abil- ers. For example, Grimsley and Jarrett (1975) con- ity and, therefore, can be additional predictors of cluded that differences were evident between job performance. In other words, personality data successful and unsuccessful managers in drive, add unique variance to the prediction of job perfor- energy, social adjustment, self-confidence, social mance. aggressiveness, and emotional stability. Similarly, a longitudinal study of managers at AT&T found personality differences between those managers Self-Report Inventories who had been promoted to middle-management positions and those who remained at lower mana- One of the most popular methods of assessing gerial positions during an eight-year period (Bray, personality is through self-report questionnaires. Campbell, & Grant, 1979). These questionnaires present the respondent with a Despite these indications, various reviews of the statement of feelings about an object, person, or validity of personality assessments in predicting activity and request the respondent to indicate job performance have found very low empirical agreement, neutrality, or disagreement with the relationships between the two variables. Guion and statement. Several such measures of the Big Five Gottier (1965) stated that the number of significant have been developed. Of these multiple measures, findings in their review of validity coefficients was we will describe the measure of the Big Five devel- barely above the chance level of occurrence. They oped by Barrick and Mount (1991) because it has partially attributed these poor findings to method- been used in several organizationally based stud- ological deficiencies in many of the validity stud- ies. Their Personality Characteristics Inventory ies. Similarly, Kinslinger (1966) found that the has 200 multiple-choice items derived from empir- frequency of methodological shortcomings in stud- ical research of several self-report personality ies using projective techniques prevented any pos- inventories. Each of the 200 questions has three itive judgment of their validity. Finally, Schmitt, possible responses, "agree," "?," and "disagree." Gooding, Noe, and Kirsch (1984) conducted a The inventory takes approximately 45 minutes to meta-analysis which determined a corrected valid- complete. The first dimension is Extraversion. ity coefficient of only .15 for self-report personal- Traits used to form this dimension include (from ity questionnaires. the positive pole) being sociable, gregarious, asser- However, in opposition to these reviews, Hogan tive, talkative, and active. The second dimension is (1991) has provided evidence strongly supporting Emotional Stability, characterized (from the nega- the relationship of personality data to performance tive pole) as being emotional, tense, insecure, ner- and argues that, to a large extent, previous findings vous, excitable, apprehensive, and easily upset. were due to the use of inappropriate assessment The third dimension is Agreeableness, and is made devices and the lack of methodological rigor in the up (positive pole) of being courteous, flexible, studies. Recent work has developed three theoreti- trusting, good-natured, cooperative, forgiving, cal and empirical arguments for the relationship of softhearted, and tolerant. Conscientiousness is the 514 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT fourth dimension and is composed (positive pole) hypotheses, extraversion was not related to perfor- of being responsible, organized, dependable, plan- mance for these sales representatives. These find- ful, willing to achieve, and persevering. The fifth ings indicate that goal-directed behavior is a dimension is Openness to Experience (sometimes stronger determinant of sales performance than is referred to as Intellect or Culture in other work the ability to interact easily with others. A related concerning the Big Five). This factor includes study (Mount, Barrick, & Strauss, 1994) found that (positive pole) being imaginative, cultured, curi- supervisor and coworkers' ratings of conscien- ous, intelligent, artistically sensitive, original, and tiousness and extraversion were valid predictors of broad-minded. performance ratings for a group of sales personnel, Among their several studies Barrick and Mount indicating that coworkers recognize the impor- (1991) conducted a meta-analysis of the use of the tance of personality factors in job performance. Big Five in selection. Using data from 117 validity Finally, two other studies have provided some evi- studies grouped across five occupations and three dence that personality data may add validity above measures of job performance, they drew the fol- that associated with cognitive ability, although nei- lowing conclusions: ther study used all the dimensions of the Big Five (Day & Silverman, 1989; Wright, Kacmar, McMa- 1. Conscientiousness is a valid predictor of job han,& Deleeuw, 1992). performance for all five occupational groups (corrected r of .20 to .23). 2. Extraversion and Emotional Stability were Projective Instruments valid predictors for some, but not all, of the five occupational groups. Although self-report personality questionnaires are used extensively in organizations, there are 3. Agreeableness and Openness to Experience instances of the use of projective instruments, showed minimal validity, although these results especially in studies that involve high-level man- may be partially attributable to the small num- agers. Two instruments that have been used in such ber of studies which examined these two work are the Thematic Apperception Test (TAT) dimensions. and the Miner Sentence Completion Scale 4. Conscientiousness was related to each of the (MSCS). In the TAT, the respondent is asked to three performance measures used. The other tell a story about each of 19 cards that depict one or four personality factors were not related to all more individuals in a variety of ambiguous situa- measures of performance. tions. It is assumed that the content of the individ- ual's stories about these cards will reveal The main conclusions from this work are that: unconscious desires, inner tendencies, attitudes, (a) the factor of Conscientiousness appears to be and conflicts. The cards are administered individu- the most important personality factor to be used ally in two one-hour sessions. The most often used in explaining job performance; (b) not all of the scoring system for the TAT was developed by Big Five factors are related to job performance, Murray and analyzes the hero (leading character in perhaps explaining the weak empirical validity each story), the needs of the hero (such as achieve- determined from previous reviews; and (c) per- ment, order, and aggression), press (the pressures sonality, unlike cognitive ability, is not related to operating on the hero), and themes (the interplay job performance in almost all jobs. Rather, spe- among needs, press, and resolution of conflict) cific factors of the Big Five are differentially (Murray, 1943). related to specific job performance criteria within The MSCS was developed for the assessment of specific occupations. motives that are manifested in managerial work. Related studies have provided further informa- The respondent is presented with 40 sentence frag- tion about the role of personality in work perfor- ments and asked to complete each (Miner, 1977). mance. Barrick, Mount, and Strauss (1993) found These items load on seven scales. Authority figures that sales representatives high in conscientiousness provides a measure of the subject's capacity to were more likely to set goals and were more likely meet role requirements in relationships with a to be committed to goals, which in turn was associ- superior. Competitive Games and Competitive Sit- ated with greater sales volume and higher supervi- uations both focus on occupational or work-related sory ratings of performance. Contrary to competition. The Assertive Role generally reflects TESTING AND INDUSTRIAL APPLICATION 51 5 confidence in one's ability to perform well and a Keystone Telebinocular. These instruments pro- wish to participate in activities. Imposing Wishes vide measures of all the visual characteristics men- refers to controlling or directing the behavior of tioned previously. others. The Standing Out from the Group scale measures the desire to assume a somewhat deviant position as compared with subordinates. Finally, Hearing Testing Routine Administrative Functions indicates the desire to meet job requirements of day-to-day The most important aspect of hearing for indus- administrative work. trial work is auditory acuity--the faintest sound Cornelius (1983) reviewed the use of projective that the individual can just barely hear. The most tests in organizations and concluded that those reliable measurement of this sound involves elec- tests which have defined scoring systems, espe- tronic audiometers. With these instruments, one cially the TAT and the MSCS, have demonstrated ear at a time is tested. During testing, the subject reliability of .80 or higher. He also reported that 10 receives the sound through a headphone pressed of 14 studies appearing in academic journals against the ear. The examiner increases the decibel between 1960 and 1981 reported significant valid- level of the transmitted sound until the subject ity coefficients between scores on projective mea- indicates that sound has been heard. This sound sures of personality and job performance. threshold is then remeasured by starting with a clearly audible sound and decreasing the decibel level until the subject reports no heating. At each ASSESSMENT OF sound-wave frequency, the subject's hearing loss PHYSIOLOGICAL RESPONSES in decibels can be determined from the audiometer dial. This dial has been calibrated with a standard There are three uses for the assessment of spe- of "normal hearing" for the population. These nor- cific physiological characteristics of individuals in mal hearing levels have been determined through work situations: (a) to enhance selection decisions, testing a large, representative sample of people. (b) to assist in the diagnosis of poor performance of current employees, and (c) to provide informa- tion for medical insurance. There are four common Strength and Coordination types of assessment of physiological responses: vision, hearing, strength and coordination, and There are two major taxonomies of physical drug testing. abilities which identify those characteristics that are necessary for carrying out work assignments. The first is the Ability Requirements Scale which Vision Testing measures the following nine physical abilities (Fleischman & Mumford, 1988): Visual sensitivity includes several separate func- tions. For industrial work, the most important are 1. static strength--maximum force that can be color discrimination, near acuity at reading (13 to exerted against external objects. Tested by lift- 16 inches), far acuity (usually measured at 20 feet), ing weights. depth perception, and muscular balance of the eyes 2. dynamic strength--muscular endurance in (phoria). The most common measure of vision is exerting force continuously. Tested by pull-ups. the Snellen Chart, which contains rows of letters of 3. explosive strength--ability to mobilize energy gradually decreasing size. It is intended to measure effectively for bursts of muscular effort. Tested only far acuity. Accuracy in reading the letters of this chart has been found to be affected by many by sprints or jumps. factors in a normal employment testing situation: 4. trunk strength--limited dynamic strength spe- amount of illumination, distance from chest, posi- cific to trunk muscles. Tested by leg-lifts or sit- tion of examinee' s head, and so on. For this reason, ups. more accurate and complete visual measures are 5. extent flexibilitymability to flex or stretch taken by using specially designed instruments such trunk and back muscles. Tested by twist and as the Ortho-Rater, the AO Sight Screener, and the touch test. 516 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

6. dynamic flexibility--ability to make repeated, abled, and younger persons higher than older. The rapid, flexing trunk movements. Tested by essential issue, therefore, is that the tests must repeated rapid bending over and touching floor. clearly be linked to critical job tasks, which require 7. gross body coordination--ability to coordinate these physical abilities in their completion. How- action of several parts of body while body is in ever, even this principle is complicated by the motion. Tested by cable jump test. question of whether the tasks can be modified to reduce or eliminate physical demands. If such 8. gross body equilibrium--ability to maintain modifications can be made, the use of the physical balance with nonvisual cues. Tested by rail- ability test which is appropriate for the original walk test. tasks may be unwarranted when used with the 9. staminamcapacity to sustain maximum effort modified job activities. requiring cardiovascular exertion. Tested by 600-yard run-walk. Drug Testing The following validity coefficients for perfor- mance on specific jobs are among the results of According to some estimates, drug abuse costs studies which examined the use of these scales: organizations over $30 billion annually through pipeline workers (.63); correctional officers (.64); absenteeism, mistakes, damage, injury, and sick warehouse workers (.39); electrical workers (.53); leave. In response, organizations are increasing the and enlisted army personnel (.87). All coefficients use of drug testing for both applicants and incum- are the product of a battery of two to four physical bent workers. The most common test is The Immu- abilities correlated with job performance (Fleis- noassay Test, which attempts to determine whether chman & Mumford, 1988). drugs are in a person's system on the basis of the The second taxonomy is the product of exten- reaction of the urine specimen to certain antibodies sive work of Joyce Hogan who combined two lines created by the immune systems of laboratory ani- of research. The first was data about physical mals. This test can detect both the presence and requirements that were derived from job analysis absence and the amount of drugs in a person's sys- and the second was data based on physical ability tem. Another, more precise, test is the Gas Chro- tests already developed for selection. Factor analy- matography~Mass Spectometry Test, which can ses were performed on several sets of data from separate complex mixtures of drugs and other sub- these two sources. Results consistently identified stances into their pure parts. When a mixture, such three factors of physical abilities (Hogan, 1991). as an extract of drugs from urine, is injected into The first, muscular strength, is the ability to apply the testing instrument, each drug will move or resist force through muscular contraction. through the instrument in gas form at a different Within this factor are three, more specific, dimen- speed. When a particular drug reaches the end of sions: muscular tension, muscular power, and mus- the instrument and enters the mass spectrometer, it cular endurance. The second factor is is separated from other drugs and is in a pure form. cardiovascular endurance, which refers to the Therefore, this test can identify a specific drug capacity to sustain gross muscular activity over from a group of similar drugs. prolonged periods. It is aerobic capacity and gen- Although these tests are quite reliable and valid eral systemic fitness involving the large muscles. in their identification of drugs in an individual, The third factor, movement quality, concerns char- there are several legal issues accompanying their acteristics that contribute to skilled performance use. For one, there is no definite relationship and also contains three dimensions: flexibility, bal- between the presence of drugs and work perfor- ance, and muscular integration. mance. The test does not indicate how much drug Specialists using results from these or other was ingested, when it was ingested, or its effect on physical ability instruments must be especially physical processes and cognitive functions. Sec- concerned by the legal issues associated with test- ond is the issue of invasion of privacy. By their ing three groups of individuals: females, disabled, very nature, drug assessment procedures are intru- and older workers. Adverse impact for scores on sive upon the individual. When such intrusiveness physical ability tests is common for each group. is coupled with the tests' inability to measure Frequently, males will score higher than females decreases in work performance, one argument on such tests, nondisabled persons higher than dis- against drug testing is that unless there is evidence TESTING AND INDUSTRIAL APPLICATION 51 7 of the individual's inability to perform work safely or how much of this aspect he or she expected from and efficiently, employers are invading the off- the job. work time of employees and are attempting to reg- Recent research has provided some explanation ulate their private lives. For this reason it is impor- of the differences in the affective and cognitive tant that testing be linked to workplace problems facets of job satisfaction. Olson and Zanna (1993) such as accidents, theft, absenteeism, and sabo- discuss that, while the view of attitudes as cogni- tage. Data that indicate the existence of such prob- tions, affective reactions, and behaviors provides a lems before testing can serve as evidence that the useful framework for examining attitudes, not company is pursuing a legitimate self-interest. every attitude must contain each of these three components. Rather, specific attitudes toward the same object may be based on either cognitive, ASSESSMENT OF WORK ATTITUDES affective, or behavioral antecedents. For example, Edwards (1990) induced subjects to form either As mentioned previously, data about employ- affect-based or cognition-based attitudes through ees' attitudes are used to assess strengths and the experimental manipulation of varying the order weaknesses in both the organization's programs of reading about and tasting a soft drink. Millar and and its physical facilities. Scarpello and Vanden- Millar (1990) also classified subjects as possessing berg (1992) have summarized guidelines for the either affect-based or cognitive-based attitudes construction and use of attitude-assessment instru- toward the same object. It seems, therefore, that ments which gather data for these purposes. job satisfaction can be either an affective and/or Because space limitations do not permit us to dis- cognitive attitude, each of which is based upon dif- cuss each of the large number of specific attitudes ferent antecedents in the workplace. For example, which have been assessed, we will only discuss Judge (1993) found support for the proposition that two of the most often studied attitudes: job satis- an individual's disposition to be satisfied with faction and organizational commitment. everyday life events moderates the relationship between affective job satisfaction and organiza- tional outcomes. That is, the more positive the dis- Job Satisfaction position of the individual, the stronger was the relationship between job dissatisfaction (affective Locke (1976) defined job satisfaction "...as a attitude) and turnover. pleasurable or positive emotional state resulting from the appraisal of one' s job or job experiences" The instruments used to measure job satisfaction (p. 1300). While this definition is generally are generally classified into two groups: measures accepted by both researchers and practitioners, of overall satisfaction and measures of satisfaction there is no certainty that the measurement of satis- of specific job facets. Both types typically employ faction reflects this definition, according to Organ Likert-type items as measuring devices. Cook, and Near (1985). They point out that job satisfac- Hepworth, Wall, and Walt (1981) reviewed and tion is typically measured with instruments mod- illustrated many of the currently used major instru- eled after attitude scales. Because psychologists ments in the following manner. Measures of over- have viewed attitudes as an assemblage of cogni- all satisfaction differ in terms of the number of tive, emotional, and action tendencies, factors items and the content of questions; items range other than emotional states have entered into the from four to 38 while the content varies from ques- measurement of satisfaction. After reviewing these tions about a worker's emotional reactions toward instruments, Organ and Near (1985) concluded the job as a whole, to cognitive reactions of organi- that "...the items on most job attitude scales tend to zational or supervisory functioning, to evaluations focus on the job itself or the factors of the job (e.g., of specific intrinsic and extrinsic features of jobs. supervision, the task, pay), not on the feelings of The score obtained from measures of overall satis- the respondent" (p. 244). The wording and format faction is obtained by summing the scores of all of items on many job-satisfaction surveys actually individual items. Reported reliabilities, (generally require a cognitive evaluation of the work situa- internal consistency), are usually at least .80. Most tion. For example, the respondent may be asked to measures also report criterion and construct valid- judge various aspects of the job relative to how ity in the form of relationships with other organiza- much of this aspect there should be in an ideal job tional variables. 518 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

Table 20.3. SCALES OF JOB SATISFACTION FACET MEASURES supervision social needs company as a whole autonomy nature of work personal growth extent of work esteem needs coworkers subordinates working conditions intrinsic rewards pay extrinsic rewards promotions friend's attitudes security family attitudes Note: Cook,Hepworth, Wall, & Warr (1981).

Work role inputs Intentions regarding job outcomes.

Utility of direct and Intentions regarding job inputs. opportunity costs Job Frames of reference r Satisfaction r Intentions regarding work role for evaluating job inclusion. outcomes

Work role outcomes Intentions to remain in charge of work situation.

Figure 20.1. Model of Antecedents and consequences of Job Satisfaction

A variety of topics (see Table 20.3) is measured ations by voting for union representation, and in job facet satisfaction, with each instrument psychological withdrawal behaviors. being composed of a different combination of fac- Figure 20.1 is a variation of a model proposed by ets. The facets that are included most often in these Roznowski and Hulin to explain the factors which questionnaires are satisfaction with pay, promo- influence job satisfaction and the variables which tions, supervision, and job content. As with the are affected, in turn, by satisfaction. Briefly, exter- overall satisfaction measures, most of the reported nal employment and economic factors, an individ- reliabilities for facet measures are internal consis- ual' s input into his or her work, the outcomes given tency estimates, which most often are at least .80. by the organization for this work, and the individ- Validity data for measures of facet satisfaction are ual' s evaluation of these outcomes lead to the indi- usually reported in terms of correlations with other vidual's affective satisfaction/dissatisfaction with satisfaction measures and correlations among the work. Such affective reaction directly influences various scales of the instrument. the individuals subsequent behavior. Satisfied and Much research has examined the organizational dissatisfied workers react differently to the same correlates of job satisfaction. Roznowski and Hulin work situation. Organ and Konovsky (1989) sug- (1992) argue that, while assessment of cognitive gest that the cognitive component of job satisfac- ability is the most important information that an tion is responsible for premeditated, intentional, organization can have about individuals prior to and sustained contributions to the organization. organization entry, measures of job satisfaction are This is because the satisfied individual wishes to similarly important after-entry data. This is maintain fairness in the social exchange relation- because job satisfaction has been demonstrated to ship between him or her and the organization. On influence variables, such as employee attendance the other hand, dissatisfied workers will do some- at work, decisions to leave an organization, deci- thing to reduce their negative feelings. Because the sions to retire, general behavioral syndromes actions that are employed to reduce these negative reflecting pro-organizational orientations (organi- feelings are often detrimental to the well-being of zational citizenship), attempts to change work situ- the organization, it is important that the organiza- TESTING AND INDUSTRIAL APPLICATION 51 9

tion be aware of job dissatisfaction through regu- respondent, organizational tenure, and level within larly administered measures of employees' the organization are all significantly, if weakly, attitudes. The actions taken by individuals in correlated to this type of commitment. Other response to dissatisfaction usually include attempts research indicates that employees may form calcu- to increase work outcomes, attempts to decrease lative/continuance commitment with an organiza- work inputs, attempts to reduce work role inclu- tion based on both their feelings about the sion, and formal attempts to change the job/work organization and the costs of leaving. That is, pro- role. vision of an excellent health care insurance policy might engender feelings of loyalty based on a pos- itive emotional attachment to the organization (i.e., Organizational Commitment the organization really cares about me) or a more "sunk costs" orientation (i.e., few firms could offer Steers (1977) defined commitment to an organi- me an equivalent policy). Reliabilities of measures zation as the relative strength of an individual's of this construct were usually reported at .80+. identification with and involvement in that organi- Affective commitment is the second form of zation. He also discussed three factors of such commitment and, conceptually, is identical to the identification. One is an individual's strong belief construct defined by Porter and colleagues (1974). and acceptance of the organization's goals. A sec- Their measure of commitment is by far the most ond is the individual's willingness to exert consid- extensively used instrument to measure this con- erable effort on behalf of the organization. The struct. Its reported internal consistency reliability third is the individual's desire to maintain mem- averaged .88 over 90 samples (Mathieu & Zajac, bership in the organization. 1990). Factor analysis of the instrument yields a Based on this definition, organizational commit- single-factor solution that is independent of work ment is frequently measured by multiple-item ethic, career commitment, professional commit- scales, such as the one described by Porter, Steers, ment, and job involvement (Marrow, 1993). Affec- Mowday, and Boulian (1974). tive commitment has been the subject of extensive empirical research which, in turn, has served to This 15-item questionnaire was designed to mea- more precisely define the construct. For example, sure...commitment .... Included in this instrument are affective commitment is positively related to age, items pertaining to the subject's perceptions con- cerning his loyalty toward the organization, his will- organizational tenure, group cohesion, communi- ingness to exert a great deal of effort to achieve cation, more complex job design, and job satisfac- organizational goals, and his acceptance of the orga- tion. Negative relationships have been found with nization's values. All items represent statements to education, stress, absenteeism, intention to leave which the subject responds on 7-point Likert scales, ranging from "strongly disagree" to "strongly agree" an organization, and actual turnover. Because of (p. 605). this extensive research, affective commitment is considered to be an important work attitude which Marrow (1993) has reviewed studies of commit- is critical in understanding employees' behaviors ment and concludes that there are two perspectives and performance. in the academic community as to the nature of commitment. One perspective focuses on calcula- tive/continuance commitment which centers on the ASSESSMENT OF WORK PERFORMANCE exchange components of the employee-organiza- tion relationship. That is, this type of commitment Assessments of individual and group work per- focuses on both perceived ease of movement by formance are important for organizations because the individual from the present organization to they directly relate to the organization' s profitabil- another one, and also on those variables which ity, competitive advantage, and survival. There- might be lost if a person were to leave an organiza- fore, one use of such performance data is in tion (e.g., seniority, relationships, retirement bene- formulating organizational strategy and setting fits, and insurance). According to this perspective, goals. A second use is to validate recruitment, the individual is bound to the organization because selection, and training programs. Third, perfor- of these variables. Calculative\continuance com- mance data are also prominent in the distribution mitment has been correlated with a number of of compensation among employees. Although antecedents and outcomes. For example, age of there are several different measures of work per- 520 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT formance, we can classify these measures into attempt to adjust specific sales data to control these three main groups. differences but such adjustments are usually based upon opinions rather than statistical data and are subject to measurement limitations themselves. Production Data

Production data consist of the results of work. Employee Data The data comprise things that can be counted, seen, and compared directly from one worker to another. Absenteeism, turnover, grievances, accidents, Related terms thathave been used to describe these and promotions are the most common variables in data are output, objective, and nonjudgmental per- the second type of performance data, employee formance measures. These measures are usually data. These variables have characteristics which based upon the specific job tasks and quite differ- are similar to those of production data. For one, ent measures can be used for the same job title. they are often routinely collected; however, unlike Usually the various measures reflect some aspect production data, this collection is usually at the of either quantity or quality of completed product/ individual level. Also, data are countable and service, for example, number of units produced per appear to be free of bias. Finally, they are gener- week, dollar volume of sales, profit of unit, num- ally accepted as being important parts of work ber of errors per hour, weight of scrap, and number behavior. of complaints from customers. However, similar to production data, employee These data are often considered to be the most data also have important measurement limitations. desirable measure of performance for several rea- First, there is often very little variance in these sons. First, such data are often easy to gather measures because the large majority of workers because they are routinely collected for business have very few absences, accidents, grievances, or operations such as production planning and bud- promotions. This is especially true for data that are geting. Also, the importance of such measures is collected for short time periods. Second, each of thought to be obvious. Finally, these data are the variables in this type of data have several dif- understood and accepted by workers as indicators ferent operationalizations which are not inter- of their performance. changeable. Therefore, the measurement specialist However, there are measurement limitations in must be clear about the construct that is to be mea- the collection of production data which should be sured. For example, absenteeism has been mea- considered before they are used. For one, these sured as: the number of separate instances, the total data can be deficient because the measure usually number of days taken, the number of short (one-or focuses on only one aspect of performance (e.g., two-day) absences, and the number of Monday volume of sales), and does not include other absences (Landy & Pharr, 1983). Similarly, turn- important aspects (e.g., service after sale). Also, over can be either voluntary (the individual ini- production data are often collected for work groups tiates the separation) or involuntary (the or administrative units rather than for individuals. organization initiates the separation). The two are While group-level performance data are certainly quite different in terms of the construct being mea- appropriate for some HR decisions within specific sured. Finally, employee data can also be deficient managerial frameworks (e.g., Total Quality Man- as a measure of overall work performance because agement), they are inappropriate for others (e.g., they do not directly address either quantity or qual- individual achievement and bonus plans). Finally, ity of production/service. there are differences in work situations among individuals that lead to nonstandardized assess- ment conditions among employees. For example, Judgmental Data sales personnel are usually assigned to territories that differ greatly in terms of sales opportunities, In judgmental data an individual familiar with economic purchasing power of residents, and the the work of another is required to judge this number of competitors. As a result, the sales vol- work. In most cases this individual is the imme- ume of a salesperson may more strongly reflect diate supervisor. However, 360-Feedback, which characteristics of the territory rather than charac- is becoming increasingly popular, uses data from teristics of the individual. Organizations frequently subordinates, peers, and supervisors to gather TESTING AND INDUSTRIAL APPLICATION 521

Table 20.4. Dimensions Used in Judgmental Performance Review TRAIT DIMENSIONS SCALES DIMENSIONS UNSATISFACTORY MARGINAL SATISFACTORY GOOD SUPERIOR

Attitude 1 2 3 4 5 Enthusiasm 1 2 3 4 5 Cooperation 1 2 3 4 5 Motivation 1 2 3 4 5 BEHAVIORALLY ANCHORED RATING SCALE PERFORMANCE DIMENSION" INTERACTING WITH BANK CUSTOMERS SCALE POINT BEHAVIOR 7 Employee smiles, greets customer by name, asks to be of service 6 Employee greets customer by name, asks to be of service 5 Employee smiles, asks to be of service 4 Employee asks to be of service 3 Employee greets customer 2 Employee smiles at customer 1 Employee remains silent until customer speaks BEHAVIORAL OBSERVATION SCALE PERFORMANCE DIMENSION: REVIEWS SUBORDINATES WORK PERFORMANCE 1. Communicates mistakes made in job activities by subordinates. Almost Never 1 2 3 4 5 Almost Always 2. Praises subordinates for good work behavior. Almost Never 1 2 3 4 5 Almost Always 3. Discusses hindrances in completing projects. Almost Never 1 2 3 4 5 Almost Always

11. Inspects quality of output materials. Almost Never 1 2 3 4 5 Almost Always 12. Reviews inventory of necessary parts and equipment. Almost Never 1 2 3 4 5 Almost Always

Total Score

complete data about an individual's work perfor- job behaviors rather than personal traits of indi- mance. Because judgmental data are opinions of viduals. Table 20.4 presents examples of both individuals, a number of measurement limita- behaviors and traits that have been used in judg- tions must be addressed before such data can be mental performance review. Research has deter- used. For example, Murphy and Cleveland mined that the use of traits is unacceptable to all (1991) have identified a number of errors that are parties involved in the performance review. That frequently made by raters (halo, leniency, cen- is, neither the rater nor the ratee are receptive to tral tendency, severity) that reduce both reliabil- the rating of personal characteristics. Second, the ity and validity. They also discuss a number of theoretical basis for the presumed relationship effective steps which can be taken to limit these between the traits and performance has not been rating errors and increase the psychometric prop- well specified or empirically verified. Because of erties of the instrument used. One step is to these deficiencies in the evaluation of traits, develop the performance appraisal system in judgmental data have focused on the individ- such a way that the raters are asked to evaluate ual's performance of specified job behaviors. 522 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

The two most frequently used types of job 5. presence of an appeal channel for use by behavior scales are Behaviorally Anchored Rating employees who wish to disagree with the rat- Scales (BARS) and Behavioral Observation Scales ings of a supervisor. (BOS). Examples of both of these types of scales 6. having multiple raters, such as the employee's are presented in Table 20.4. Development of immediate supervisor and a manager one level BARS includes gathering descriptions of specific above the immediate supervisor, participate in work behaviors, sorting these incidents into dimen- the evaluation. sions of work performance, and determining the scale points to be assigned to each incident within a dimension (Bernardin & Beatty, 1984). When SUMMARY using this type of scale, the rater is asked to circle the behavior of each dimension which reflects the Assessment of employee characteristics is nec- actual work behavior of the individual being eval- essary for development and implementation of uated. The individual being evaluated, therefore, is various programs that seek to improve the work assigned the points associated with the behavior performance and psychological reactions to work which is circled. of employees. The five major classes of employee The BOS technique is similar to BARS in that characteristics that have been assessed for such descriptions of specific work behaviors are gath- programs are: knowledge, skills, and abilities, ered and sorted into separate dimensions. How- which are especially important for the perfor- ever, multiple scales are produced for each work mance of the technical portions of tasks; personal- dimension. That is, the specific work behaviors ity, which is necessary for work group interaction; physiological attributes, which are essential for which have been sorted into each dimension are acquiring information and completing the physi- used to form a number of scales for that dimension. cal components of tasks; attitudes, which reflect As is indicated in Table 20.4, each of these behav- an individual's emotional reactions to or cognitive ior scales uses a measure of frequency as its evaluations of features of the job or the organiza- response format. Therefore, when the rater uses the tion; work performance, which is a primary indi- BOS technique, he or she reviews each of these cator of the employee's contribution to the scales and rates the frequency of occurrence of success of the organization and has been linked to each behavior in the BOS instrument in the work the organization's ability to function in the com- performance of the individual being evaluated petitive market. (Latham & Wexley, 1981). As this chapter describes, in addition to develop- Because judgmental performance data are exten- ment of organizational programs, scientists have sively used in many organizational programs (e.g., used data from assessment of these five character- compensation, promotion, termination, training), istics of individuals to study the relationships of they have been part of several court disputes. these five with numerous other variables. The These court cases have produced rulings which results of these studies have produced an under- have comments on specific characteristics of judg- standing of the antecedents and consequences of mental performance systems. According to McE- individual work attitudes and work performance. voy and Beck-Dudley (1991), characteristics that Such understanding, in turn, has contributed to our courts have regarded as being necessary for understanding of human behavior. defense against illegal discrimination are:

1. presence of written instructions given to raters REFERENCES which describe the performance review system. 2. use of behaviorally based dimensions for judg- Allport, G. ( 1961). Pattern and growth in personality. New York: Holt, Rinehart & Winston. ment rather than trait dimensions. Anastasi, A. (1982). Psychological testing (5th ed.). 3. basing the behaviorally based dimensions on New York: Macmillan. formal job analysis. Ash, R., Johnson, J., Levine, E., & McDaniel, M. 4. having employees review the appraisal results (1989). Job applicant training and work experi- in a formal session in which feedback is given ence evaluation in personnel selection. In K. Row- by the supervisor. land & G. Ferris, (Eds.), Research in personnel TESTING AND INDUSTRIAL APPLICATION 523

and human resource management. Greenwich, Fleischman, E., & Mumford, M. (1988). Ability CT: JAI Press. requirements scales. In S. Gael (Ed.), Job analysis Barrick, M., & Mount, M. (1991). The big five per- handbook for business, industry, and government, sonality dimensions and job performance: A meta- Vol. 2. New York: Wiley. analysis. Personnel 44, 1-26. Gatewood, R., & Feild, H. (1998). Human resource Barrick, M., Mount, M., & Strauss J. (1993). Consci- selection, 4th ed. Fort Worth, TX: Dryden Press. entiousness and performance of sales representa- Gaugler, B., Rosenthal, D., Thornton, G., & Bent- tives: Test of the mediating effects of goal setting. son, C. (1987). Meta-analysis of assessment cen- Journal of Applied Psychology, 78, 715-722. ter validity. Journal of Applied Psychology, 72, Bernadin, H., & Beatty, R. (1984). Performance 493-511. appraisal: Assessing human behavior at work. Ghiselli, E. E. (1973). The validity of aptitude tests in Boston: West Publishing Co. personnel section. Personnel Psychology, 26, Borman, W. C. (1991). Job behavior, performance, 461-477. and effectiveness. In M. Dunnette & L. Hough Gordon, M. F., & Kleiman, L. S. (1976). The predic- (Eds.), Handbook of industrial and organizational tion of trainability using a work sample test and an psychology (2nd ed., Vol. 2). Palo Alto, CA: Con- aptitude test: A direct comparison. Personnel Psy- sulting Psychologists Press. chology, 29, 243-253. Bray, D. W., Campbell, R. J., & Grant, D. L. (1979). Grimsley, G., & Jarrett, H. (1975). The relation of Formative years in business. Huntington, NY: past managerial achievements to test measures Robert E. Krieger Publishing. obtained in the employment situation: Methodol- Brown, S. (1978). Long-term validity of a personal ogy and resultsIII. Personnel Psychology, 28, history item scoring procedure. Journal of Applied 215-231. Psychology, 63, 673-676. Guion, R. M., & Gottier, R. F. (1965). Validity of per- Campbell, J. (1990). An overview of the army selec- sonality measures in personnel selection. Person- tion and classification project (Project A). Person- nel Psychology, 18, 135-164. nel Psychology 43, 243-257. Haetner, J. (1977). Race, age, sex, and compe- Cascio, W., & Phillips, N. (1979). Performance test- tence as factors in employer selection of the ing: A rose among thorns? Personnel Psychology, disadvantaged. Journal of Applied Psychology, 32, 751-766. 62, 199-202. Cook, J. D., Hepworth, S. J., Wall, T. D., & Warr, P. Harris, M. (1989). Reconsidering the employment B. (1981). The experience of work. New York: interview: A review of recent literature and sug- Academic Press. gestions for future research. Personnel Psychol- Cornelius, E. T. (1983). The use of projective tech- ogy, 42, 691-726. niques in personnel selection. In K. Rowland & B. Hogan, J. (1991). Physical abilities. In M. Dunnette & Ferris (Eds.), Research in personnel and human L. Hough (Eds.), Handbook of industrial and resources management. Greenwich, CT: JAI organizational psychology (2nd ed., Vol. 2). Palo Press. Alto, CA: Consulting Psychologists Press. Day, D., & Silverman, S. (1989). Personality and job Hogan, R. (1992). Personality and personality mea- performance: Evidence of incremental validity. surement, in M. Dunnette & L. Hough (Eds). The Personnel Psychology, 42, 25-36. handbook of industrial and organizational psy- Dipboye, R. (1992). Selection interviews: Process chology (2nd ed., Vol. 2). (pp. 873-919). Palo perspectives. Cincinnati, OH: SouthWestern. Alto, CA: Consulting Psychologists Press. Dreher, G., & Sackett, P. (1983). Commentary. In G. Hulin, C. (1991). Adaptation, persistence, and com- Dreher, & P. Sackett (Eds.), Perspectives on mitment in organizations. In M. Dunnette & L. employee staffing and selection. Homewood, IL: Hough (Eds.), Handbook of organizational and Richard D. Irwin. industrial psychology (2nd ed., Vol. 2). Palo Alto, Edward, K. (1990). The interplay of affect and cogni- CA: Consulting Psychologists Press. tion in attitude formation and change, Journal of Hunter, J. (1986). Cognitive ability, cognitive atti- Personality and Social Psychology, 59, 202-216. tudes, job knowledge, and job performance. Jour- England, G. (1971). Development and use of nal of Vocational Behavior, 29, 340-362. weighted application blanks. Minneapolis, MN: Hunter, J., & Hunter, R. (1984). Validity and utility of Industrial Relations Center, University of Minne- alternative predictors of job performance. Psycho- sota. logical Bulletin, 96, 72-88. 524 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

Hunter, J. E., Schmidt, F. L., & Hunter, R. (1979). ence of the Society for Industrial and Organiza- Differential validity of employment test by race: A tional Psychologists. comprehensive review and analysis. Psychologi- Millar, M., & Millar, K. (1990). Attitude change as a cal Bulletin, 85, 721-735. function of attitude type and argument type. Jour- Janz, T. (1982). Initial comparisons of patterned nal of Personality and Social Psychology, 59, behavior description interviews versus unstruc- 217-228. tured interviews. Journal of Applied Psychology, Miner, J. B. (1977). Motivation to manage: A ten- 67, 577-580. year update on the "studies in management edu- Judge, T. (1993). Does affective disposition moderate cation" research. Atlanta, GA: Organizational the relationship between job satisfaction and vol- Measurement Systems Press. untary turnover? Journal of Applied Psychology, Mount, M., Barrick, M., & Strauss, J. (1994). Validity 78, 395-401. of observer ratings of the big five personality fac- Keenan, A. (1977). Some relationships between inter- tors. Journal of Applied Psychology, 79, 272-280. viewers' personal feeling about candidates and Mumford, M., & Owens, W. (1987). Methodology their general evaluation of them. Journal of Occu- review: Principles, procedures, and findings in the pational Psychology, 50, 275-283. application of background data measures. Applied Kinslinger, H. J. (1966). Application of projective Psychological Measurement, 11, 2-6. techniques in personnel psychology since 1949. Murphy, K., & Cleveland, J. (1991). Performance Psychological Bulletin, 66, 134-149. appraisal: An organizational perspective. Boston: Landy, F., & Pharr, J. (1983). The measurement of Allyn & Bacon. work performance methods, theory, and applica- Murray, H. (1943). Thematic apperception test. Cam- tions. New York: Academic Press. bridge: Harvard University Press. Latham, G., Saari, L., Pursell, E., & Campion, M. Olson, J., & Zanna, M. (1993). Attitudes and atti- (1980). The situational interview. Journal of tude change. Annual Review of Psychology, 44, Applied Psychology, 65, 422-427. 117-154. Latham, G., & Wexley, K. (1981). Increasing pro- Organ, D. W., & Konovsky, M. (1989) Cognitive ver- ductivity through performance appraisal. Read- sus affective determinants of organizational citi- ing, MA: Addison-Wesley Publishing Co. zenship behavior. Journal of Applied Psychology, Ledvinka, J., & Scarpello, V. (1991). Federal regula- 74, 157-164. tion of personnel and human resource manage- ment, second edition. Boston: PWS-Kent Organ, D. W., & Near, J. P. (1985). Cognition vs. Publishing Company. Affect in measures of job satisfaction. Interna- Locke, E. A. (1976). The nature and causes of job sat- tional Journal of Psychology, 20, 241-253. isfaction. In M. D. Dunnette (Ed.), Handbook of Pearlman, K., Schmidt, F., & Hunter, F. (1980). industrial and organizational psychology. Chi- "Validity generalization results for tests used to cago: Rand McNally College Publishing Co. predict job proficiency and training success in Marrow, P. (1993). The theory and measurement of clerical occupations." Journal of Applied Psychol- work commitment. Greenwich, CT: JAI Press Inc. ogy, 65, 373-406. Mathieu, J. E., & Zajac, D. M. (1990). A review and Porter, L., Steers, R. T., Mowday, R. T., & Boul- Meta-analysis of the antecedents, correlates, and ian, P. V. (1974). Organizational commitment, consequences of organizational commitment. Psy- job satisfaction and turnover among psychiatric chological Bulletin, 108, 171-194. technicians. Journal of Applied Psychology, 59, McCormick, E. J., & Jeanneret, P. R. (1989). Position 603-609. Analysis Questionnaire (PAQ). In S. Gael (Ed.), Reilly, R., & Chao, G. (1982). Validity and fairness of The job analysis handbook for business, industry, some alternative employee selection procedures. and government. New York: Wiley. Personnel Psychology, 35, 1-62. McDaniel, M., Schmidt, F., & Hunter, J. (1988). A Reilly, R., & Warech, M. (1991). The validity and meta-analysis of the validity of methods for rating fairness of alternative to cognitive tests. In L. training and experience in personnel selection. Wing (Ed.), Employment testing and public pol- Personnel Psychology, 41,283-314. icy. Boston: Kluwer. McEvoy, G., & Beck-Dudley, C. (1991). Legally Robinson, D. (1981). Content-oriented personnel defensible performance appraisals: A review of selection in a small business setting. Personnel federal appeals court cases. Sixth Annual Confer- Psychology, 34, 77-78. TESTING AND INDUSTRIAL APPLICATION 525

Rowe, P. (1963). Individual differences in selection Schmitt, N. (1976). Social and situational determi- decisions. Journal of Applied Psychology, 47, nants of interview decisions: Implications for the 305-307. employment interview. Personnel Psychology, 29, Roznowski, M., & Hulin, C. (1991). The scientific 79-101. merit of valid measures of general constructs with Schmitt, N., Gooding, R., Noe, R., & Kirsch, M. special reference to job satisfaction and job with- (1984). Metaanalysis of validity studies published drawal. In C. J. Cranny, P. C. Smith, & E. F. Stone between 1964 and 1982 and the investigation of (Eds.), Job satisfaction. New York: Lexington study characteristics. Personnel Psychology, 37, Books. 407-422. Scarpello, V. G., & Vandenberg, R. (1992). Some Steers, R. M. (1977). Organizational effectiveness: issues to consider when surveying employees' A behavioral view. Santa Monica, CA: Good- opinions. In J. Jones, B. Steffy, & D. Bray (Eds.), year. Applying psychology in business: The manager's Task Force on Assessment Center Guidelines (1989). handbook. Lexington, MA: Lexington Books. Guidelines and Ethical Considerations for Assess- Schmidt, F., Greentho|, A., Hunter, J., Berner, J., & ment Center Operations. Public Personnel Man- Seaton, F. (1977). Job sample vs paper-and-pencil agement, 18, 457-470. trade and technical tests: Adverse impact and exam- iner attitudes. Personnel Psychology, 30, 187-197. Ulrich, L., & Trumbo, D. (1965). The selection Schmidt, F., Gast-Rosenbery, I., & Hunter, J. interview since 1949. Psychological Bulletin, 63, (1980). "Validity generalization for computer 100-116. programmers." Journal of Applied Psychology, Valenzi, E., & Andrews, I. R. (1973). Individual dif- 65, 643-661. ferences in the decision process of employment Schmidt, F., Hunter, F., Pearlman, K., & Shane, T. interviewers. Journal of Applied Psychology, 58, (1979). "Further tests of the Schmidt-Hunter 49-53. Bayesian validity generalization procedure." Per- Wright, P., Kacmar, M., McMahan, G., & Deleeuw, sonnel Psychology, 32,257-281. K. (1992). P=f(M x A): Cognitive ability as a Schmidt, F., Hunter, J., & Caplan, J. (1981). "Validity moderator of the relationship between personality generalization results for two job groups in the and job performance. Academy of Management petroleum industry." Journal of Applied Psychol- Best Paper Proceedings. Madison, WI: Omni- ogy, 66, 261-273. press. CHAPTER 21

PSYCHOLOGICAL ASSESSMENT OF ETHNIC MINORITIES

Antonio E. Puente Miguel Perez Garcia

INTRODUCTION that understanding others who are, by design, dif- ficult to understand, is again a reasonable goal. Whereas the study of abnormal behavior through While we believe that for individual cases and in the use of scientifically based psychological clinical situations this is not only desirable but instruments has a relatively lengthy and interest- also ethically appropriate, again this paradigm is ing history (see introductory chapter), the assess- insufficient. A third goal, rarely addressed by ment of individuals who fall toward the edges of workers in this field, is that we believe that the the "bell curve" poses unique social, political, study of culture and psychopathology combined and scientific challenges (Olmedo, 1981; Scarr, (especially from a cognitive or neuro-cognitive 1988). Traditionally, approaches to the study of perspective) provides a much larger pool of data individuals who are considered outside the main- about the human condition than previously used stream of whatever society they belong to have paradigms. been politically and socially based. The under- An example of this approach is found in a standing of the study of these individuals was study from the World Health Organization grounded on the assumptions that it is morally (1973). They reported that with regards to correct to understand these individuals (Fowlers schizophrenia, in Nigeria 58 percent and in India & Richardson, 1996). Their "abnormal" function- 51 percent of hospitalized individuals experi- ing may be more saliently expressed by under- enced complete remissions after two years after standing their affiliation to a culture that is not treatment. In contrast, in Denmark only 6 per- appreciative or reflective of the majority group cent remission had been reported. The question which rules or guides the social context in which then becomes what aspects of Nigerian and they live. However, we propose that while such a Indian culture is present (that are not in Den- motivation would appear reasonable and politi- mark) which allow for such a high rate of recov- cally correct, it is still insufficient scientifically. ery. The Basic Behavioral Science Task Force of First, such an approach presupposes that the the National Advisory Mental Health Council role of psychology is partially if not largely ori- (1996) reported that in Los Angeles, Mexican ented toward righting the wrongs of a society's Americans indicated that schizophrenia was a ancestors and, hence, primarily a social enter- transitory condition associated with nervousness prise. While a reasonable goal, that would appear whereas Anglo-Saxon counterparts believed that to us as insufficient. Second, one might assume schizophrenia was a permanent and total deterio-

527 528 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT ration of mental functions. The question becomes species (in the case of Beach) or other racial and how culture is defined and how it helps focus the ethnic groups (in the case of Guthrie). more global issue of human function and dys- One direct outcome of this situation, shown in function. Thus, we believe that basic principles recent statistics, suggests that few individuals about the human condition are the best under- appear interested in studying how understanding stood by studying the interrelations of culture racial and ethnic group membership may contrib- and psychopathology through the use of scientifi- ute to understanding behavior. The article by the cally and cognitively (possibly neuro-cogni- American Psychological Association's (APA) tively) based psychological instruments. Such an Committee for Human Resources, "The Chang- approach could potentially yield unique insights ing Face of American Psychology" (Howard, into individual differences and general theories Pion, Gottfredson, Flattau, Oskame, Pfaffiin, of psychological function and dysfunction. Bray, & Burstein, 1986) underscores the paucity However, individual differences and diversity of minorities pursuing study and being associ- are often viewed as impediments to the develop- ated with all areas of psychology. Of special con- ment of general principles of behavior. Of signif- cern is the limited number of minorities in icance is the lack of understanding and graduate schools and in faculty positions. Hall sensitivity for larger group differences. Few (1997) has suggested that "cultural malpractice" would question the basic ability of specific psy- exists across all aspects of psychological peda- chological tests (e.g., Wechsler Memory Scale) gogy, research, and clinical activities. Bernal and to assist in the discrimination or classification of Castro (1994) indicated that only 12 percent of specific diagnostic groups (i.e., those with, from all clinical programs require courses involving those without memory dysfunction). In contrast, cultural issues but 89 percent of the programs few would disagree that affiliation with specific indicated that they "integrated" such issues into demographic groups (non-diagnostic) would be their program. Interestingly, they reported that of great value in diagnostic classification. Pre- approximately half of all clinical programs did sumably, this assumption is based on the con- not have an ethnic minority on the staff. To add cept that psychopathology (or for that matter, insult to injury, less than 10 percent of clinical skills, abilities, or any other behavioral variable) students are of color. These disturbing trends is relatively free from the contamination of these persist a decade after the historical report in 1978 potential confounds (Westermeyer, 1987a). Thus, by the President's Commission on Mental this perspective suggests that other (non-diagnos- Health. tic) group membership, while possibly important The lack of understanding combined with the in some capacity, would have little or no effect lack of resources to solve the problem will on nomenclature issues. Such a belief is deeply clearly leads to further complications of an rooted in nonempirical foundations, and its already complex issue. Nevertheless, the com- beginnings lie in a number of historical trends, mon denominator is limited understanding. This none well documented or acknowledged. Thus, limited understanding of minority populations speculation rather than definitive analysis is the has resulted in overrepresentation of minority source for the following observations. groups in several distinct psychopathology Traditionally, few attempts have been made to groups. Maheady, Towne, Algozzine, Mercer, understand the behavior of individuals in minor- and Ysseldyke (1983) and others have observed ity groups. Brislin (1988) and others have that members of minority or underrepresented cogently argued that psychologists for too long groups tend to be overrepresented in special edu- have categorically shown poor understanding of cation programs, especially programs for the behavior traits and patterns of individuals who do mildly handicapped. However, it is unclear that not belong to groups associated with mainstream less "biased" tests will produce less overrepre- society. This limited perspective of the nature of sentation. Thus, we believe that basic principles behavior was first addressed by Frank Beach about the human condition is best understood by (1950) in his now classic article, "The Snark was studying a Boojum." In more contemporary terms, Robert The overrepresentation of minority groups in Guthrie's (1976) book, Even the Rat was White, handicapped conditions has, in turn, resulted in cites clear evidence not only of restricted sam- negative stereotypes. In 1991 the Department of piing but of limited understanding of many other Education reported that African Americans corn- PSYCHOLOGICAL ASSESSMENT OF ETHNIC MINORITIES 529 prise 21 percent of total enrollment but an aston- An initial step in understanding members of ishing 42 percent of individuals labeled educable minority groups is to define such groups. mentally retardate, 38 percent of those in educa- According to accepted practice, individuals are ble mentally retardate, and 22 percent of those different from larger groups if they are not mem- considered learning-disabled. Hispanics com- bers of that group. Group composition can be prise 13 percent of the total enrollment, but 10 determined by social, legal, biological, statisti- percent of educable mentally retardated, 22 per- cal, and behavioral variables. Possibly the easi- cent trainable mentally retardate, and 12 percent est and most socially acceptable variable is learning disabled. In contrast, for Asian students biological, such as color of skin. However, other total enrollment is reflective of enrollment of variables may also play a role. Statistical meth- special programs. Such stereotypes in the short ods define group memberships by numerical term encourage the assignment of individuals to scores obtained, while social and legal incorrect diagnostic groups (e.g., learning dis- approaches may use societal tradition to define abled). In the long term this stereotypical and membership. Behavioral and psychological vari- grossly incorrect database may eventually serve ables represent the most robust method as they as a foundation for potentially incorrect theories should be free of bias due to the use of empiri- and research programs on racial and ethnic dif- cal methods and the criterion in question, the ferences (e.g., Jensen, 1980). While all valid pro- function of the person. After all, the color of an grams of inquiry should exist (Kuhn, 1970), individual's skin is much less critical than their constraints on the scientific process fueled by thinking patterns when it comes to understand- emotional and unempirical variables have little ing such issues as capacity to learn. value for the discipline, for the science, for soci- Standard practices have used overt and obvi- ety, and most of all, for members of ethnic- ous variables to classify members into minority minority groups. But as late as the end of the groups. For example, if an individual is not white 20th century, we still are surprised and disap- (Caucasian) in North America he or she must pointed to read that "distinguished" historians of belong to a minority group. One need look no American culture continue to misunderstand the further than the disciplines of animal behavior very essence of the issues at heart. and neuropsychology to realize that gross mor- The purpose of this chapter will be to avoid phological signs are often not well correlated such an orientation by focusing as much as pos- with clear behavioral patterns. For this chapter, sible on the data that are available. Initially, this Brislin's (1988) classification system for human contribution will focus on providing both histori- diversity is adopted. Contrary to popular belief, cal and clinical background of testing of ethnic- only three races exist. These include Caucasian minority group members. Standard clinical and (e.g., white), black, and Indian. The Indian race psychometric practices involving individuals of can be subdivided into Native American (e.g., minority groups will be presented and critiqued. Cherokee, Incas, etc.) and Asian (e.g., Japanese, Suggestions for theoretical shifts as well as prac- Chinese, etc.). Ethnicity is another variable that tical clinical and psychometric approaches will can be used to differentiate mainstream from be outlined, with cognizance of the potential pit- minority groups. Here, ethnicity is defined as a falls, perceived or real, that presently exist. collective identity (e.g., Jew, Italian, etc.). Next, This chapter is primarily intended for North group composition can be determined by culture American audiences. Numerous limitations in the (e.g., southern, urban, etc.). This variable implies available data set, whether clinical or otherwise, that groups can be defined according to social would make a more geographically ambitious and personal identification. While less under- approach impossible. In fact, it could be argued stood and accepted, other variables could also that the most fertile research database is found in assist in determining group membership. These the states. Nevertheless, the approach (though not include, but should not be limited to, gender, sex, necessarily the data) should be considered a physical status (e.g., disability), social class, and model for workers in other cultures, groups, or religion. In 1990 the Bureau of the locations (e.g., Native Indians in mainstream Census has more or less compressed these dis- Brazilian culture) in order to address the issues tinctions avoiding the differences between race, of psychological assessment of ethnic-minority culture, and ethnicity. In a bold step, they pro- group members. posed five different groups; Spanish/Hispanic/ 530 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

Table :21.1. Ethnicity and Race According to the 1990 U.S. Census Data: Origin, Total Number, and Subgroups

Spanish/Hispanic/Latin Background or Origin Origin -- Latin America or Spain; Total = 22,354,059 Cuba

Mexican/Mexican-American/Chicano Puerto Rican Hispanic Latin America (e.g., Panamanian, Peruvian, Venezuelan, Ecuadorian, Guatemalan, etc.) Spaniard

African American/Black/Negro Origin = African or Caribbean; Total = 29,986,060

Asian or Pacific Islander Origin = Far East, Southeast Asia, Indian Subcontinent, or Pacific Islands; Total = 7,273,662 Asian Indian Chinese Japanese Korean Vietnamese Filipino Hawaiian

Indian (American) or Alaska Native Origin = North America; Total = 1,959,234 Aleut American Indian Eskimo

White Origin = Europe, North America, Middle East; Total = 199,686,070

Latin Background or Origin, African American/ can Americans, Asian Americans, Hispanics, and Black/Negro, Asian (American) or Alaska Native Americans) who now constitute less than Native, and White. The Spanish group contains 20 percent of the U.S. populations will soon con- five different groups whereas the Asian has stitute approximately 50 percent of the Ameri- seven separate subcategories and the Indians can population (Basic Behavioral Science Task have three. Force of the National Advisory Mental Health In the area of psychological assessment, race Council, 1996). has been the most widely studied of the previous A necessary outcome of appropriately defin- variables. Sex and, to a lesser degree, ethnicity ing group membership is the implication that a have been considered as potential though not minority member will engage in behavior that is highly salient variables. However, culture, physi- different from the mainstream norm but not nec- cal status, social class, and religion have rarely essarily abnormal. Thus, clearer understanding of been considered important in understanding human behavior is the goal. Such an understand- human behavior. Whether this neglect is due to ing is not only academically useful but also con- collective wisdom or ignorance is not known tains treatment implications. The importance of (nor is it the focus of this chapter). ethnic-minority group membership for psycho- Regardless of the variable used, ethnic-minor- logical treatment has been outlined by Sue and ity group membership will be defined as indi- Zane (1987), while Lawson (1987) has reported cated previously by groups who are both its implications for psychopharmacological inter- politically powerless and sparsely represented in vention. Caution should be inserted here. Care- scientific inquiry. However, what may be a ful between-group comparison often implies minority group in terms of ethnicity in 1990 may limited concern for within-group analysis. Using not be by the year 2000. Census figures suggest, the Hispanic population in the United States as for example, that people of color (including Afri- an example, the behavioral patterns of Cubans, PSYCHOLOGICAL ASSESSMENT OF ETHNIC MINORITIES 531

Mexicans, and Puerto Ricans may actually differ in mental tasks." Terman suggested, "Children of more from each other than the entire group of the group should be segregated in special classes Hispanics differs from Caucasians. In an ongo- and given instruction which is concrete and prac- ing translation and standardization of the Wec- tical. They cannot master abstraction, but they shler Intelligence Scale for children, Hispanics can often be made efficient workers, able to look have been further subdivided into Central Ameri- out for themselves" (p. 92). He continued, cans, Cubans, Mexicans, Puerto Ricans, and "There is no possibility at present of convincing South Americans. Thus, within-minority group society that they should not be allowed to repro- analysis will eventually become as important as duce, although from a eugenics point of view minority versus majority group comparisons. they constitute a grave problem because of their unusually prolific breeding" (p. 92). Such an orientation is observed in Goddard's HISTORICAL PRECEDENTS work and later in 's groundbreak- ing work with the Army Alpha and Beta tests Galton's "Inquiries into Human Faculty and Its during World War I. These tests were meant to Development" written in 1883 is most often con- classify A (intelligent) and D and E (feeble- sidered the beginnings of psychological assess- minded) individuals with a mean mental age of ment (Boring, 1950). In order to evaluate human 13.08. (This score may have prompted Goddard disabilities (and not sins, as had commonly been to term any adult with less than 13 years of men- the case prior to Galton), this British pioneer tal age as "moron."). However, both immigrants developed the "mental test." While the test and nonwhites tended to score lower, prompting intended to measure such variables as color dis- Yerkes (1923) to write in Atlantic Monthly about crimination and auditory reaction time, the pur- noninherited racial differences. This conclusion pose of establishing the Anthropometric readily supported the racist opinion of Madison Laboratory at the International Health Exhibition Grant who considered Nordics superior to other in London was to determine the range of human races. Based on these observations, Yerkes and abilities. Together with the founding of the jour- others encouraged strict immigration laws espe- nal Biometrika and the Eugenics Laboratory, cially for "the negro." To curtail the reproduc- Galton attempted to develop the concept of racial tion of those already in the United States, several improvement (Schultz & Schultz, 1996). American followers of Galton (namely John H. The discrimination of acceptable and nonac- Noyer and Victoria Woodhull) established a cen- ceptable human characteristics has, unfortu- ter for American eugenics in Cold Spring Har- nately, found its way into present-day mental bor with financial support from the Carnegie testing, possibly by way of James McKeen Cat- Institution (Leahey, 1997). One of the greatest tell. After obtaining his Ph.D. from Wundt in proponents of eugenics, Henry Goddard, pub- Leipzig, Germany, Cattell came into contact with lished his famous book The Kallikak Family: A Galton (Boring, 1950), who in turn had enor- Study in the Heredity of Feeblemindedness mous influence both directly (e.g., with numer- (1912). This book, probably more than any pub- ous students) and indirectly (e.g., as editor of lished work of the time, was used for the control Science) on the study of mental ability in the of reproduction by ethnic minorities. United States. However, it was not until the Reflecting the influence of this and similar appearance of Henry H. Goddard at Vineland works, sterilization and vasectomy became com- Training School in New Jersey, and later Lewis mon phenomena. According to Leahy, one of the Terman at Stanford University that a research greatest landmark decisions on the issue was that program of psychological abilities became part of of a mental patient, Came Buck. After giving mainstream psychology. birth to a retarded child out of wedlock, the "fee- Using "the evidence of mental tests," Terman bleminded" Buck was involuntarily sterilized. (1916) indicated that "the average intelligence of She, in turn, sued the state of Virginia but lost in women and girls is as high as that of men and a split decision at the Supreme Court level. It boys" (p. 68). Nevertheless, he concluded in his seems as though unempirical (and presently con- book, The Measurement of Intelligence, that the sidered unethical) approaches to the measure- "dullness" seen in "Indians, Mexicans, and ment of abilities are never easily resolved Negroes raises the question of racial differences scientifically. Earlier in this century this issue 532 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT was far from being resolved academically. In this decade undoubtedly the most signifi- Approximately 50 years later Stephen Jay Gould cant and most detrimental work on this issue (1981) continues to argue that such strong con- comes from Herrnstein and Murray (1994) in clusions have indeed been based on weak data. their highly controversial book, . Unfortunately, this approach to the understand- Herrnstein and Murray aggressively pursue the ing of minority behavior, at best weak and spuri- traditional concepts that ethnic minorities do not ous, was the foundation for the Jewish genocide score well on standardized tests, including tests by the Nazis. However, not until 1954 did the of achievement and intelligence, because of judicial branch in this country make strides to genetic and biological limitations. In many erase this previously accepted and now embar- respects these authors provide a modern-day ver- rassing "scientific" orientation. The Brown case sion of the ideas of Terman, Goddard, and oth- in 1954 allowed for desegregating of races in the ers (e.g., Graham) linked to immigration laws, school system. However, cases specifically refer- reproductive limitations, and intellectual and ring to minorities and testing did not surface social segregation. until the 1970s (Reschly, 1984). Generally, the Sternberg (1997) has cogently addressed the plaintiff in these cases represented the three importance of ultimate criterion, possibly life- major minority groups of the time--African long learning capacity, rather than test intelli- American, Hispanic, and Native American--who gence. Test intelligence may be a significant had been poorly and unethically classified as though, by design, incomplete condition for retarded. understanding learning. Such intelligence While most of the cases were won or favor- appears, according to Stenrberg, to be predictive ably settled out of court, it was not until the leg- of scholastic achievement. However, this type of islative aspect of the litigation-legislation cycle achievement is only partially correlated with life- occurred (Bersoff, 1981) that reform began to be long success. Two other factors are suggested by developed and later implemented. According to him that are obviously not understood by Herrn- Reschly (1984), the federal Education for All stein and Murray; behavioral intelligence and Handicapped Children Act of 1975 "was the intelligence. It is the concept of behavioral intel- most important and most widely applicable legis- ligence that we find particularly interesting. One lative act." This act opened the road for later liti- could clearly argue that a migrant worker whose gation meant to define more succinctly the spirit native language is Spanish would do particularly of this law. poorly on the SAT or, for that matter, the GRE. Perhaps linked to these legal efforts, psycholo- Thus, there would be little question that such a gists have become increasingly aware of the need person would not gain admission to most selec- to document human abilities more carefully. In tive colleges in the United States. Further, one his introduction to the special issue, "Cultural could argue that such an individual would unde- Factors in Understanding and Assessing Psycho- niably do poorly in a traditional university curric- pathology" (Journal of Consulting and Clinical ulum. But to argue that such a person is Psychology), James Butcher (1987) stated that biological or genetically inferior seems down- "the application of psychological procedure and right stupid. For example, such a migrant worker methods with patients from different cultural has found a method to travel from rural Central backgrounds raises numerous methodological America to, say the eastern United States, with issues. "Issues such as psychological equiva- little money, inadequate transportation, and lim- lence, test reliability and validity, and test utility ited understanding of the culture. They are able were some of the factors that Butcher considered to find work, complete the task, live frugally, critical. Five years later, Bethancourt and Lopez send money back to their homeland, and locate (1993) still believe that the study of ethnic alternative employment within days of comple- minorities still hold "at best second place" in tion of the job at hand, often in another state. mainstream psychology. This has occurred One could argue that a suburban-raised Anglo- despite critical reports suggesting that such an Saxon who has played on the high school sports approach would be detrimental not only to eth- team and has dedicated his or her life to specta- nic-minorities but to psychology at large tor sports and socially driven concerns could not (McGovern, Furomoto, Halpern, Kimble, & under any circumstances go to Central America McKeachie, 1991). and replicate what their counterparts have PSYCHOLOGICAL ASSESSMENT OF ETHNIC MINORITIES 5 3 3 accomplished in the United States. The question poorly developed and executed studies on bias in becomes, are we measuring "true" intelligence or testing. some understanding of culture. One of the most controversial figures in men- Of course, to the typical reader of this chap- tal bias research is Jensen, of the University of ter, such an example seems rather extreme. California at Berkeley; his most controversial Hence, we have chosen to provide another exam- book is Bias in Mental Testing (1980). Accord- ple, which should be closer to the experiences of ing to Jensen, mental testing has been criticized most psychologists. In another and more recent because of one or more of the following reasons: article Sternberg and Williams (1997) suggest that the GREs, still the most widely used stan- 1. cultural bias dardized measure of achievement for acceptance into graduate school, predicts little in terms of 2. specific test items graduate school performance and maybe less 3. inability to define or measure intelligence than that in terms of career success. Additional 4. tests that measure too narrow a range of abilities and also sophisticated arguments against the lim- 5. failure to measure innate capacity ited arguments of The Bell Curve are also found 6. IQ tests that measure only learned skills in Gould (1996). Thus, the criterion for intelli- 7. IQs that are inconsistent gence and achievement and, for that matter 8. test scores that are contaminated by extraneous pathology, according to Sternberg (and sug- factors gested by Gould) as well as accepted here, is not 9. misuses, abuses, and undesirable consequences test scoring (especially alone and out of context) of testing but life-long ability to adapt to the demands of life. This chapter attempts to build on this new- found scientific interest in an effort to determine According to Jensen (1980), these criticisms the needs, limitations, and directions associated are largely unfounded and confused with other with the psychological assessment of ethnic- factors. As he wrote, "Anxiety about one's own minority populations in North America. status, or the importance of the traits measured by tests, or sympathy for the less fortunate, may prompt the acceptance of criticisms of tests with- THEORETICAL ISSUES out evidence" (p. 23). Unfortunately, such cri- tiques tend to focus on IQ tests and are In a chapter of this type it would be essen- emotionally interpreted. They complicate the tially impossible to address all pertinent theoreti- question and prevent adequate understanding of cal issues that apply to the psychological the valid issues. assessment of ethnic minorities. We have chosen Reynolds and Brown (1984) presented a set of to focus on three main issues, bias, accultura- reasons, which are applicable to bias for a wider tion, and culture believing that they are the three range of tests. These include: most critical issues involved in this area of study. 1. inappropriate content 2. inappropriate standardization samples Bias 3. examiner and language bias 4. inequitable social consequences Kenneth Eells pioneered the concept of bias in 5. measurement of different constructs mental measurement, specifically the mental test. 6. differential predictive validity While his work focused only on whites, it did address the importance of differencemin this case, social class--in the assessment of mental Regardless of the source of bias, the definition function (Eells, 1951). Although the reasons for of bias must also be considered. Unfortunately, doing so are not entirely clear, some workers in numerous definitions are available in the litera- psychometrics generalized his findings to other ture-some more heuristic and plausible than oth- populations, namely African Americans. This ers. The following are two samples of the many incorrect generalization launched a wave of available. 534 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

[Bias results from] differences in the extent to which minority issues. The task force assumes, without the child being tested has had the opportunity to question, that if tests are to be used as predictors know and become familiar with the specific subject matter or specific process required by the test item. of future performance, these tests do not seem to (Eells, 1951, p. 54) be biased against African Americans" (p. 93). However, less clear evidence of bias is pre- Psychometric bias is a set of statistical attributes con- jointly of a given test and two or more specified sub- sented against other ethnic-minority groups. populations (Jensen, 1980, p. 375) Unfortunately, one is left with the sense that out- side Asian Americans, other ethnic minorities Flaugher (1978) has suggested that test bias score below their Anglo-Saxon counterparts. Fur- can mean more than simple knowledge or psy- ther, they suggest that numerous factors, but not chometric deficiencies. Indeed, bias could be rep- necessarily bias, ranging from economics to resented in a wide variety of concerns including, genetics, many be playing a role in these differ- but not limited to, both psychometric issues ences. Despite these opinions, may questions (mean differences, differential validity, item con- have yet to be formulated and, of course, tent, internal validity) and test usage (over-inter- answered. Until then, as Reynolds and Brown pretation, selection model, and atmosphere). He have concluded, the verdict on test bias is still concluded that in 1978 the research was promis- not in (scientifically). ing, but the results were still disappointing. However, Susuki and Valencia (1997) have Twenty years later, the research and the results suggested that a significant drop in bias are both disappointing. research exists. They believe in this precipi- Among the more current research findings, an tous drop is due to the fact that belief that excellent example is Drasgow's (1972) article, bias does not exist. This erroneous belief is "Biased Test Items and Differential Validity." In due to the following reasons: this review the author addresses differences between majority and minority groups in valid- 1. Some minority groups have not been studied ity coefficients. The results of his study provide extensively and, in some cases, not at all. support for earlier findings suggesting that valid- 2. Test bias in school-based tests has been done ity coefficients may not prove useful in examin- only with some but not a large variety of tests. ing test bias. He concludes: "Test scores can be 3. In actuality, there are mixed results. They report used to predict criterion performance for minor- several studies by Valencia and colleagues ity group members. Nevertheless, it may be inap- which suggest that the K-ABC contain bias in propriate to compare test scores for minority predictive validity and content but not in con- group members with test scores for majority struct validity or reliability. group members" (p. 529, italics added). In a sim- 4. Bias research has traditionally been done with ilar vein, Cole (1981) concluded in her article, nonpatient populations. "Bias in Testing," that "there is no large-scale, consistent bias against minority groups." Never- theless, both "subtle aspects of the testing situa- Malgady (1996) proposes an interesting twist tion" and presumably more refined understanding to the theoretical foundations of bias research. still evade workers in the field. In contrast, He proposes that what is necessary is to reverse Humphries (1986) has argued that even if items the null hypothesis. That is, we must assume that differ between groups, these items should not be bias exists. If one commits an error in measure- labeled as biased if adequate measurement prop- ment, then it is better to assume that bias exists erties are taken into account. so that precautions are taken to protect the indi- Recently, the American Psychological Associa- vidual. tion released the results of a task-force study on these issues (Neisser, Boodoo, Bouchard, Boykin, Bordy, Ceci, Halpern, Loehlin, Perloff, Acculturation Sternberg, & Urbina, 1996). Presumably the focus of this task force was to address authorita- If a minority group does poorly on a test, rela- tively the issues brought out by The Bell Curve. tive to a majority group, two interpretations may Whereas the article addressed numerous critical be used to account for the discrepancy. A rather issues, it fell far short when addressing ethnic- emotional one is provided by Jensen (1980)-- PSYCHOLOGICAL ASSESSMENT OF ETHNIC MINORITIES 535 that the difference is accounted for by biological lations. Fradd and Hallman (1983) concluded that factors such as genetics. A less popular interpre- until an individual has been taught strategies to tation used by researchers studying integration of build bridges from a previous to a current an immigrant group into a majority or main- domain of knowledge, the validity of test mea- stream culture is that of acculturation. sures is questionable. Assimilation into a larger, more mainstream The process of acculturation must be under- culture allows an individual to understand and stood, however, as a dynamic rather than static adjust to the cultural, social, and psychological process. Acculturation does not imply reaching requirements of that culture. Conversely, those an imaginary threshold at which time one who do not adapt are considered to exhibit becomes clearly acculturated. Knight and Kagan greater degrees of psychopathology. An illustra- (1977) reported that it took about three genera- tion of the lack of adaptation was reported by tions for Mexican-American children to develop Hoffmann, Dana, and Bolton (1985) who found modal responses on Anglo-Saxon children with that Sioux Native Americans with strong ties to regard to social motives. tribal values and language were more likely to Four stages have been postulated (Basic exhibit psychopathology as measured by the Behavioral Science Task Force of the National MMPI. These findings have also been replicated Advisory Mental Health Council, 1996). They with other minority groups, including Hispanics are; assimilation (becoming part of the majority (e.g., Montgomery & Oroz, 1984). Focusing on culture), acculturation (adapting to the majority cognitive style and intelligence, Gonzales and culture), alternation or biculturalism (adequately Roll (1985) reported differences between Mexi- engaging two cultures), and multiculturalism can Americans and whites on several test mea- (holding on to a personal and non-majority iden- sures. However, no group difference was tity while participating in a goal-directed activity observed between Anglo-Americans and a sub- of the majority culture). group of the original sample of Mexican Ameri- Further, acculturation is not dichotomous, cans who had been shown to be acculturated to instead it is multifaceted (Phinney, 1996; Anglo-American culture. Magana, de La Rocha, Amsel, Magana, Fernan- One method to determine whether accultura- dez, & Rulnick, 1996). Triandis (1982) has sug- tion has been achieved, and thus controlled, is to gested that culture could be physical (e.g., administer an acculturation scale. Marin, Sab- buildings, tools, etc.) or subjective (e.g., social ogal, Marin, and Otero-Sabogal (1987) have norms, roles, beliefs, and values). The subjective developed a 12-item scale, which measures could include family dynamics, religious beliefs, acculturation in Hispanic populations. The vali- language limitations, individualism, and so forth. dation criteria included generation, length of resi- Thus, one could conceivably be adapted to a cul- dence in the United States, age at arrival, ethnic ture physically, live and appear to be American self-identification, and an acculturation index. (i.e., live in North Carolina, dress in Brooks These findings have been extended to children Brothers clothing, etc.) but have specific behav- (e.g., Franco, 1983) as well as to other cultural ioral patterns that would clearly identify the per- groups such as Asian Americans (Suinn, Rick- son as non-North Carolinian (i.e., native ard-Figueroa, Lew, & Vigil, 1987). Preliminary language would be Spanish, have extended fam- findings suggest that age (younger), sex (male), ily, practice Catholicism, and so forth). and length of exposure to the predominant cul- Berry (1990) has proposed an interesting the- ture (Bumam, Telles, Kamo, & Hough, 1987) as ory of acculturation. The process involves three well as cultural awareness and ethnic loyalty levels: (Padilla, 1985) are critical factors in the accultur- ation process. Another scale used for accultura- tion is the Acculturation Rating Scale for 1. Antecedentsminternal, external, and traditional. Mexican Americans (ARSMA and ARSMA-II) 2. Processes--cultural change, acculturation, (Cuellar, Arnold & Maldonado, 1995; Cuellar, psychological acculturation, Harris & Jasso, 1980). While this is a promising 3. Consequentsmchanged cultural and social scale, more research is necessary to generalize system, changed psychological status of use to other Hispanic and ethnic-minority popu- persons. 536 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

Such a comprehensive approach appreciates between-race differences with regard to biologi- the multidimensional and phasic nature of accul- cal variables. The hypothesis is made that simi- turation. Bethancourt and Lopez (1993) have also lar assumptions can be generated with regard to suggested that lack of acculturation implies per- behavioral and cognitive variables as well. ceived stress, presumably because of the individ- Indeed, social class might be a more salient vari- ual's inability to "fit" with the majority culture. able in grouping individuals than race. In other The problem is in determining why that "stress" words, individuals from a high social class would exists by defining the specific cultural values more likely be different than persons from a low (e.g., lack of understanding of social norms) that social class than a white and black from similar produce the lack of acculturation and eventually social classes. First, they suggest the bottom up the perceived stress. And even though not sug- approach which starts with the data generated gested by the authors, but certainly implied, is from cross-cultural studies in order to then alter that perceived stress must be controlled in order existing theories about human behavior. They to make certain that what is measured is accultur- suggest the work of Triandis and colleagues as a ation and not stress secondary to limited accul- benchmark for this approach. Second, and a more turation. novel approach, is what they term the top down While it is clear that acculturation is seen as approach. The essence of this approach is to the ability of immigrants (e.g., Hispanics and determine how culture helps define the larger Asians) to adapt to a majority group culture (e.g., concept of human behavior and cognition. In the United States), the possibility is considered other words, culture is seen, much like psychopa- that an analogous concept could be applied to thology in the earlier ideas of Neal Miller, as a nonimmigrant ethnic minorities already living in unique way of being able to understand "nor- North America. If, as it is argued in the next sec- mal" human function. Instead of examining how tion, intelligence is largely, if not completely, a cultures are different, the focus would shift to cultural phenomenon, then acculturating to the determining what are the most salient variables majority culture is a prerequisite for the develop- in helping distinguish individuals. Using Hispan- ment of successful learning strategies and even- ics as an example, one would ask not how His- tual intelligence. Thus, it is argued that African panics are different than say Anglo-Saxons and Americans living in the United States but not what variables contribute to those differences. In participating fully in the American culture may this ethnic-minority group the following vari- not be acculturated. Hence, the same issues ables might be pertinent; language, social orien- would apply to this ethnic-minority group as it tation, family dynamics, and religion. Then, the would to Hispanics and Asians. question would become how, for example, does language affect a person's behavior to the point of excluding them from a majority group essen- Culture and Ethnicity tially making them pathological. A third-tier question then would be what should be patholog- According to recent position papers, neither ical. Essentially, the final question would be bias nor acculturation may be the most salient what is the criterion for labeling pathology. variables that need to be addressed in under- Phinney (1996) has furthered this concept by standing ethnic minorities. Bethancourt and proposing that ethnicity, a subset of culture, Lopez (1993) suggested that most studies to date could be studied at three levels. First, she sug- on the issues at hand have been at best descrip- gests cultural values, attitudes, and behaviors that tive of the differences between cultures. Whereas help define a group. Second, group identification such approaches appear on the surface useful, is partially based on a subjective sense of what it they suggest that they are at best an initial step in means to belong to the group. Finally, she sug- the more important question of culture. To date, gests that group identification is based partially they argue, the field does still not understand the on the specific experiences associated with that role of culture in behavior and cognition. For group identification. example, they believe that using race as a vari- Finally, an alternative to doing cross-cultural able, either dependent or independent, is inappro- investigations is to begin by understanding cul- priate. Zuckerman (1990) has reported that ture. According to Greenfield (1997), "ability within-race differences appear larger than assessments don't cross cultures". Specifically, PSYCHOLOGICAL ASSESSMENT OF ETHNIC MINORITIES 5 3 7 she suggests that values and meanings, knowl- one could use the currently used measures edge, models of knowing, and conventions of described in this chapter within the context of communication are not easily translatable across subsamples. Specifically, if selection to a col- cultures and could be culture-specific. That is, lege is the goal, then a college would first choose the criterion of a particular meaning must be to accept representation from all groups as desir- understood before it is "translated". She con- able. Next, they would apply the traditional stan- cludes that "tests are not universal instruments". dards (e.g., standardized test scores) within the accepted or chosen subsamples (e.g., Caucasian, Hispanics, African Americans, etc.). Considering Social Policy the predictive validity of these tests, it is hypoth- esized that within-group (or subsample) variance Whether tests are biased or culturally free, would be greater than between-group (i.e., high- whether an individual belongs to a minority or a scoring blacks and high-scoring whites) variance. majority group, whether different groups are bio- To assist policymakers, researchers need to logically equal or unequal, group differences still place greater importance on studying issues of exist. To deny the obvious would be inappropri- race, culture, ethnicity, and related variables. The ate. Certain minority groups perform differently, findings must then be applied to broaden our more often than not worse, than majority groups limited understanding of differences in psycho- on specific items, tasks, or tests. These differ- logical test performance of minority group mem- ences drive social policy. Academic psychology bers. Hall (1997) suggests the following steps in would undoubtedly prefer to research these prob- attempting to reach these objectives; (1) Ensure lems and discrepancies more thoroughly before that the psychology curriculum is culturally allowing findings to affect the judicial and legis- diverse, (2) recruit and retain diverse faculty, (3) lative process, because the data for any of these recruit and retain diverse students, (4) monitor, questions are at best inconclusive as well as for the sake of accountability, efficacy of initia- emotional and at worst confusing. tives, (4) encourage culturally diverse research However, policy must be and will be made in and publications, (5) increase the number of edi- the absence of adequate data and in the presence tors and reviewers of diverse background, (6) of emotion (see Bersoff, 1981). This reality could ensure minimum cultural competency for psy- explain why Cole (1981) concluded that test bias chology students, (7) understand state-of-the-art research is likely to have only a small impact on research on topics of diversity, and (8) increase complex social policy issues. Regardless, there diversity within membership and leadership of are issues that relate to the available data. In the the American Psychological Association. first edition of this handbook, Reschly (1984) Of course, there is the issue of who is to pur- addresses the concept of fairness. According to sue these questions, both in academic and him, two approaches have been adopted. Equal research settings. In the seminal article, "The treatment implies no bias or documentation in Changing Face of American Psychology" selection procedures and that all candidates, (Howard et al., 1986), the future for ethnic- regardless of demographic affiliation, are treated minority group representation is presented as equal. An alternative to this approach is equal quite dismal. While women have made signifi- outcomes, which implies that selection should cant strides, African Americans, Hispanics, match population demographics. Regardless of Asian-Americans, and Native Americans con- the approach and the data, North American soci- tinue to lose ground, in terms of representation in ety has adopted in principle the concept of fair- graduate school ranks (Hall, 1997). Similar ness. The question remaining is which method trends exist in academic ranks, and presumably described by Reschly will be chosen and what, if in clinical settings as well (Bernal & Castro, any, implications will the current paucity of data 1994). Programs within the American Psycholog- and lack of scientific agreement have on social ical Association, including the Minority Fellow- policy formation and implementation. Another ship Program and the Minority Neuroscience and politically limited approach would be to Fellowship Program, may aid talented minorities assume that representation of the American pop- to pursue graduate training. Unfortunately, under- ulation (or for that matter whatever criterion pop- graduate majors in psychology mirror the same ulation was chosen) is a criterion of choice. Next, trend (Puente, 1993). Indeed, by the time minori- 538 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

Table 21.2. Several Structured Interviews Applicable to Ethnic-Minority Populations INTERVIEW REFERENCES Brief Psychiatric Rating Scale Overall & Gorham (1962) Inpatient Multidimensional Rating Lorr &Lett (1969) Mental Status Schedule Spitzer, Endicott, & Flenn (1967) Present State Examination Wing (1970) Structural Clinical Interview (DSM III) Spitzer & William (1983) International Classification of Disease World Health Organization (199?) Interview (for ICD-10) Mini-Method Station Examination Folstein, Folstein & McHugh (1975) ties have chosen a college, they most likely have Structured interviews may help in avoiding committed to a course of study. Simply put, these pitfalls. Numerous interview methods, despite the urgency of the questions raised, the including several presented in this volume, seem future for a better understanding of psychologi- generally well suited for use with minority popu- cal assessment of ethnic-minority group mem- lations, especially since they are often based on bers looks bleaker than its past especially when objective diagnostic criteria (e.g., Research Diag- issues such as "pipeline" of prospective students nostic Criteria). Several of these methods are is considered. Whereas one would hope that the found in Table 21.2. Although most of these structured interviews natural forces or evolution of psychology would have been well studied and validated, validity "take care of the problem", social policy initia- studies often use the judgment of the clinician as tives may have to jump-start what psychology the criterion variable (Spitzer & Williams, 1980). has verbalized yet never realized. Further, it is well accepted that cultural and eth- nic variablesmsuch as behavior patterns, nonver- bal cues, translation equivalence, concept ASSESSMENT METHODS equivalence, gender differences, and general cul- tural beliefsmare often misunderstood by even This section of the chapter will focus on spe- the most sensitive clinician (Hall, 1997; Spitzer, cific assessment methods, including interviews, Endicott, & Fleiss, 1967; Westermeyer, 1987a). standard measures, culturally sensitive methods, Recent research has also revealed that expres- and behavioral assessment methods. As feasible, sion of psychological symptoms is differentially each section will cover a variety but, not an affected by culture. Interviews and diagnostic exhaustive set, of tests or assessment strategies conclusions are based on signs and symptoms including application (and/or translation), norms, which could be considered normal (versus abnor- limitations and cautions, and suggestions for use. mal) in specific cultures (e.g., belief in the devil, describing somatic abnormalities using meta- phors, etc.). The end result is confounding symp- toms with culture (Basic Behavioral Science Interview Task Force of National Advisory Mental Health Council, 1996). The interview, whether structured or unstruc- One way to avoid this complication is to use tured, remains the initial step of any psychologi- interview methods that either have been for- cal assessment and also the most commonly used mally validated or are in current use with these method for obtaining information. The interview populations. For example, the Present State is a frequently used method for obtaining data in Examination was an interview used for the inter- cross-cultural contexts. As Zubin (1965) and oth- national pilot study of psychopathology (World ers have pointed out, however, the unstructured Health Organization, 1973). Another method is interview poses problems since it may yield that of using a translator or someone knowledge- unreliable data resulting from a host of uncon- able about ethnic-minority groups. However, trolled factors. even this approach has limitations. It is not PSYCHOLOGICAL ASSESSMENT OF ETHNIC MINORITIES 539 unusual for the translator to be a lay person with Mexican migrant worker employed as a field limited understanding of psychological princi- hand in Colorado). Finally, and possibly most ples as well as an individual with personal inter- important, the clinician must understand his or est in the patient. Further, translators may not her own limitations in other sociocultural situa- approximate a balanced bilingual, or worse yet, tions. To enhance his or her understanding of not understand the culture in question. Distor- others, the psychologist must become aware of, tion or misconception further impairs data gath- and possibly experience, other cultures and eth- ering, especially with severely disorganized nic behavior patterns and cognitions. Hall (1997) patients or individuals whose culture is very dif- has suggested that all clinicians must be well ferent from that of the diagnostician. Velazquez, versed in these issues, initially in completing a Gonzales, Butcher, Castillo-Canez, Apocada, and multicultural graduate course and subsequently in Chavira (1997) suggested that an important and clinical training. often over-looked first step in an evaluation is to allow the patient to choose the language to be used in the evaluation. Intellectual Several steps might be taken to attempt to con- trol interview distortion. First, in order to bridge Tests which attempt to measure the construct the language and cultural gap between patient of intelligence are not only the most commonly and psychologist, rapport should be established used psychological tests (see related chapters in prior to the interview. Greenfield (1997) has this volume) but also the most criticized (Neisser reported that ease in speaking to strangers, even at al. 1996, Sternberg, 1997). The literature is though they are professionals, varies across cul- replete with controversies about the efficacy of tures. For collective cultures (e.g., Asian and the construct of intelligence and its measurabil- Hispanic), it is typical to limit discussions to ity (Gould, 1996; Helms, 1992), and strong and only known individuals (Kim & Choi, 1994). In often emotional arguments have been levied other words, you only relate one's problems with against tests of intelligence by members of eth- intimate or close friends or family friends. In nic-minority groups (Herrnstein & Murray, contrast, in North America providing personal 1994). Before these arguments are considered, information to a stranger, but presumably a pro- the most commonly used tests of intelligence will fessional, increases the perception of objectivity be reviewed relative to their applicability to and effectiveness. minority populations. Westermeyer (1987b) suggested that diagnos- The application of intelligence tests to chil- tic interviews may take up to twice the usual dren of minority populations has yielded the time of a standard interview. Also, the clinician most empirical data as well as the most contro- should make sure that ambiguous (whether real versy. Of the tests applicable to children, the or imagined) questions or answers are clarified. Wechsler Intellectual Scale for Children (WISC) Confrontation, the hallmark of some structured is one of the most popular psychometric tests of interview methods, should be avoided if possible intelligence (Puente & Salazar, 1998). Despite since it may adversely affect client-clinician rap- that the WISC-III was published in 1991 (Wech- port. sler, 1991), most data on this topic exists with By far the most important aspect of any diag- the first two versions of this test. Excluding nostic interview is to place the client in his or her Asian children, the results, in general suggest up own bio-psychosocial context and not the psy- to one standard deviation difference between chologist's context. Otherwise, a patient's behav- ethinic-minority groups and the criterion sample, ior could be incorrectly interpreted as Anglo-Saxon children. Using Hispanic children maladaptive (Adebimpe, 1981). To avoid errone- as an example, it appears that these differences ous conclusions, the psychologist must put spe- are erased if the child is a third-generation Amer- cial emphasis on understanding the patient's ican. Thus, the issue might be more that the culture, race, ethnicity, class, or social context WISC might be measuring some type of accul- that grants him or her membership in a minority turation process. group. Not only must that context be understood Nevertheless, conflicting and nonconclusive but it should be understood as it relates to the evidence is often found. For example, in one patient's relationship to majority culture (e.g., thorough review of the literature, the race of the 540 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

examiner did not seem to affect the validity of Little information is found for adult intelli- intelligence scores in African-American children gence testing with the Wechsler Adult Intelli- (Graziano, Varca, & Levy, 1982). Using the cri- gence Scale Revised (WAIS-R) and, due to its teria outlined by Jensen (1980) for determining recent publication date, the WAIS-III (though the bias in testing, Sandoval (1979) concluded that items appear to be much less culturally biased the "WISC-R appears to be non-biased for and the norms are more reflective of the Ameri- minority group children." Other factors are pre- can population). For example, in the first edition sented by Sandoval to explain observed minority of this handbook, Lindenmann and Matarazzo versus majority group scores. These findings are (1984) indicated that the Army Alpha was devel- supported by Ross-Reynolds and Reschly (1983) oped for literates and the Army Beta for the non- in a study involving Anglo, African American, English speaking. The implicit assumption is that Hispanic, and Native-American Papago. While non-English-speaking individuals were illiterate. no bias in the WISC-R was found against Afri- Of course, if the dominant language becomes can Americans and Hispanics, ceiling effects that of the client, then it is the psychologist who influenced the response pattern of the Papagos. is illiterate. Language, however, may be confounded in Using both the WAIS and the WAIS-R, Whit- bilingual children and thus needs to be clarified worth and Gibbons (1986) reported that differ- prior to the administration of the WISC-R. San- ences were found using both tests and that the most doval (1979) examined the evidence of cultural significant differences appeared to be the conver- bias for Anglo, Hispanic, and African-American sion of race to scale scores. Reynolds, Chastain, children. Further, the Spanish version of the Kaufman, and McLean (1987) re-analyzed the data WISC-R does not have acceptable norms for for the 1981 standardized sample of the WAIS-R each cultural or ethnic group and should be used and reported a 141/2 point difference between whites and African Americans on the Full Scale with extreme caution. Concern is also cited by IQ. In attempting to resolve these discrepancies, Dana (1984) who indicated that the WISC-R is Grubb (1987) examined the IQ differences in pro- biased for traditional Native-American children. foundly and severely mentally retarded individuals He indicated that a pattern of spatial > sequen- using Weschler's test. He reported no differences tial > conceptual > acquired knowledge exists between whites and African Americans in this across both ages and tribes. The difficulties asso- sample of subjects and concluded that lower IQ ciated with using the WlSC in other ethnic- scores of African Americans were not biologically minority cultures is discussed in Puente and determined and, instead, were attributable to other Salazar (1998). factors. Lampley and Rust (1986) examined the valid- Unfortunately, few data other than the results ity of the Kaufman Assessment Battery for Chil- of the Weschler tests exist on measures of intel- dren and found that African Americans scored lectual abilities. While one might expect that significantly lower on this test. These findings such tests as the Raven Progressive Matrices and are supported by others (e.g., Sandoval & Miille, the Beta would be less ethnically biased, the data 1980). Nevertheless, these conclusions are in provide little support for this (or contradictory) direct contrast to those of Hickman and Rey- views. For example, using minority group nolds (1986-1987) who reported that "blacks did offenders, Hiltonsmith (1984) reported that these not perform significantly better in the test form subjects actually scored lower on the Beta than developed solely on their own item statistic." on the WAIS-R. It seems that regardless of the data, contrast- Obviously, one of two things must be occur- ing interpretations abound. An interesting and ring. There is either incorrect measurement of eloquent attack on these issues was leveled by intellectual function or some difference (not defi- George Jackson, chair, Association of Black Psy- ciency) is present. Before accepting the possibil- chologists in 1975. A more balanced perspective ity of difference, measurement error must be on this issue is presented by Cole (1981) as well eliminated or reduced to the lowest possible as Reynolds and Brown (1984) and Helms level. One possible way to address this is to use (1992). Additional commentaries and rebuttals greater care and ingenuity in the construction of are found in the 1985 article by Jensen in Behav- intellectual tests. Using the WISC as an exam- ioral and Brain Sciences. ple. Care must be taken in the development of PSYCHOLOGICAL ASSESSMENT OF ETHNIC MINORITIES 541 items that are not culturally biased, both across tests to a neuropsychological sample would be at cultures and within subcultures (e.g., Cubans ver- best haphazard. On a more theoretical orienta- sus Mexicans). Second, greater care must be tion, Drasgow (1972) addressed test-item bias taken in the standardization process. A typical and differential validity by using a "profoundly" protocol might include two phases, a try-out biased test. However, in this case (as with all phase that helps develop further an item pool and others), no direct or indirect mention is made of a standardization phase that would closely mimic neuropsychological tests. the U.S. population. Anecdotal and clinical evidence indicate that Another possibility is to consider intelligence these variables may have little, if any, effect on from a totally different perspective. For exam- specific sensory and possibly motor measures. ple, Sternberg (1996) has suggested that intelli- Some support for this contention exists. For gence is really nothing more than success in life. example, Roberts and Hamsher (1984) adminis- Hence, tests such as the Learning Potential tered both the Facial Recognition and Visual Assessment Device (Feuerstein et al., 1979), Naming Tests of the Multilingual Aphasia Exam- Cognitive Assessment System (Naglieri & Das, ination to African Americans in a consultation 1996), as well as standard tests from the neurop- setting. They reported negligible racial bias. In sychological literature provide a more unique contrast, Adams, Boake, and Crain (1982) found way to address the possible underlying variable, that bias did exist with regard to several vari- problem solving, in intelligence. Indeed, it is ables, including ethnicity, in neuropsychological expected that future tests of intelligence will performance. In both brain-damaged and normal have strong foundations in neuropsychological samples, African Americans and Mexican Ameri- performance. cans exhibited more errors than did Caucasian participants. One may extrapolate from early (though questionable) motor-learning studies on Neuropsychological race that motor measures may be affected by race. However, as implied, the data are question- It is often assumed that brain functions are not able because of numerous methodological and affected by non-neurological variables. A review theoretical issues. Other individual variables are of the table of contents of major neuropsycholog- definitely affected. Language, for example, is a ical texts of the 1980s and 1990s suggests that difficult variable to measure across groups issues of culture, ethnicity, and race have not because it contains syntactical, grammatical, and been addressed to date. Even more revealing are cultural content that precludes a direct transla- the reference sections of the books, which indi- tion/interpretation of a specific concept. For cate that very few articles on these issues exist. example, the location in a sentence of nouns and A review of the existing journal literature also verbs differs across certain languages. Another exposes the paucity of references surrounding example involves the Spanish alphabet, which neuropsychological assessment and the effects of contains two additional letters, fi and I-I. Cogni- culture, ethnicity, and race. In Reliability and tive styles may similarly be affected because of Validity in Neuropsychological Assessment, Fran- variables, which directly affect cognitive manipu- zen (in press) presents an excellent overview of lations, such as specific style or analysis of infor- issues concerning most measures of neuropsy- mation. Additionally, indirect variables may play chological ability. While different forms of valid- a role. Asians or Hispanics not acculturated to ity are considered, no mention is made of the North American norms may find it difficult to application of the tests to minority group mem- permit a professional to examine "their minds." bers. In certain subcultures this probing is allowed Most of the sparse data that do exist on this only by medicine men, witch doctors, or "curan- topic are found in the non-neuropsychological lit- deros." Thus, it may be impossible to obtain erature. For example, Lopez and Romero (1988) valid data because of the client's fear of testing. assessed intellectual functions in Spanish-speak- While few individual neuropsychological tests ing adults using both the WAIS and the Puerto have been adapted or translated, the two most Rican version of the WAIS. While the authors widely used batteries, the Halstead-Reitan and report that differences did exist, test equivalence Luria-Nebraska Neuropsychological Batteries, is generally elusive and its application for these have been used with diverse populations. Both of 542 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT these batteries have been translated into Spanish tive. Projective or cognitive-perceptual tests (e.g., (HRNB by Melendez; LuriaNebraska by Puente Rorschach) are quite commonly used with minor- and colleagues) and are presently being used in ity members because of their inherent ease of other cultures (e.g., Chinese). Of the two, the administration and superficial adaptability and Halstead-Reitan may prove, at least initially, to interpretation. According to preliminary analyses be more adaptable since the focus is less on lan- by Exner and Sciara (personal communication, guage function than is the Luria-Nebraska. In July 7, 1989), the Rorschach, an internationally both cases, however, the lack of data from accepted measure of cognitive-perceptual status, diverse populations is presently hindering their does not appear to be biased against Asian Amer- application. icans, African Americans, or Mexican Ameri- The data that do exist, though extremely cans. However, it is important to state that no sketchy, may indicate the direction for future data exist to support this contention. research. For example, complications are intro- In contrast, limitations of test adaptability are duced in a report on sex, age, developmental more readily investigated for issues such as bias variables, and cognitive functioning by Denno, with objective measures. Clearly, the most often Meijs, Nachsshon, and Anrand (1982). Differ- used test of personality and psychopathology is ences were noted on a variety of cognitive tests the Minnesota Multiphasic Personality Inventory (e.g., Stanford-Binet) but only for four and eight (MMPI). The homogeneity of the original MMPI year olds. Specifically, "white males scored the sample limits its ready application to minority highest on all tests, followed by white females, groups (see Butcher, Braswell, & Raney 1982). black females and black males." Thus, variables According to Dahlstrom, Welsh, and Dahlstrom such as sex and age may interact with race (and (1972), the norms used in the original MMPI other variables). If these studies are found to be sample were Caucasian, married, rural, blue-col- valid examples of neuropsychological measures, lar workers, with an eighth-grade education. then a clear and easy identification of variables contributing to diversity of neuropsychological However, Dahlstrom, Diehl, and Lachar (1986); performance may not be feasible. Dahlstom, Lachar, and Dahlstrom (1986), and More recently, Perez-Arce and Puente (1996) Lachar, Dahlstrom, and Moreland (1986) have reviewed the literature with a focus on under- suggested that even when important demo- standing ecological validity of neuropsychologi- graphic variables are taken into account (e.g., cal tests for Hispanics living in North America. race and socioeconomic stares), approximately If neuropsychological assessment focuses on 12 percent to 13 percent of the total variance of problem solving, they suggest that different prob- the basic scales is accounted for. Still, the popu- lem-solving strategies are employed by Hispan- larity of the test has resulted in translation into ics. For example, many neuropsychological tests approximately 100 languages (Butcher, 1984; use time in assessing brain dysfunction. Whereas, Williams, 1987) and a wealth of cross-cultural, in a competitive culture, like the American, time ethnic, and racial studies based on research using is a critical variable not to be wasted, and so on, this instrument have been published. the opposite is often true for other cultures. In an early review of ethnicity and the MMPI, Hence, slowed performance, which could actu- Greene (1987) did an exhaustive examination of ally be interpreted by the Hispanic as prolonging studies. Over 100 studies were analyzed accord- a task of interest, would be incorrectly inter- ing to type of scale and item level across groups preted as "brain-damage". Unfortunately, in their including African American white, Hispanic review the authors conclude that outside the white, Asian American, and Native American. work of a few researchers such as Ardila (e.g., Greene concluded that too many variables and Ardila, Rosselli, & Puente, 1994), little is found too few adequately completed studies prevent in the literature. conclusions of bias. The variables in question include subject parameters, ethnic group mem- bership profile validity, moderator variable, and Personality and Pathology scores analyzed. Additional methodological con- siderations include appropriate statistical analy- Tests of personality could be generally catego- ses, adequate sample size, and validity of rized as one of two types--projective or objec- statistical (versus clinical) significance. Based on PSYCHOLOGICAL ASSESSMENT OF ETHNIC MINORITIES 543 his review, Greene provided the following four Using the recommendations of Velazquez, and conclusions: colleagues (1997) for the use of the MMPI with Hispanics, the following recommendations are 1. At this stage of our understanding, it is too suggested for all ethnic minorities: premature to develop norms for specific ethnic and racial groups. 1. When options are available, use the most recent 2. Subjects have to be identified with an ethnic version of the test. group using subjective self (not clinician or 2. Administer the entire, and not a short, version experimenter) identification. of the test. 3. Empirical and not clinical differences should be 3. Appreciate prior test-taking experience of the emphasized. test-taker. 4. Finally, more research needs to be focused on 4. Test in the language selected by the test-taker. the special scales of the MMPI. 5. Evaluate results within a bio-psychosocial context. 6. Appreciate the effects of acculturation on test As exhaustive as the review is and as heuristic results. as Greene's conclusions may be, others advocate 7. Interpret results based on research literature and different orientations. For example, Gynther (1981), Gynther and Green (1980), and others not on cultural stereotypes. argue that specific norms--and, in some cases, 8. Always use a variety of test sources to arrive at items--be developed, using an empirical method- conclusions. ology rather than a review of the literature. Bet- ter understanding of ethnic, cultural, and race differences and their application to interpretation Achievement, Aptitude, and Interest of T scores, specific scale scores, or patterns pre- clude widespread use of the MMPI with minori- Achievement tests are still widely used in a ties. For example, it seems foolish to group all variety of settings. A starting point involving Hispanics together as Greene and others have achievement assessment is that of achievement done. As Sue and Zane (1987) have indicated, motivation (Basic Behavioral Science Task Force being culturally sensitive is being aware of of the National Advisory Mental Health Coun- within-group heterogeneity. Further, little under- cil, 1996). Indeed, it assumed that motivation standing appears evident in the MMPI research plays a relatively small role in achievement test- with regard to differences among culture, ethnic- ing. Nevertheless, access to models, ethnic- ity, and race. Until such issues, as well as those minority status, and related variables produce an outlined by Greene (1987), are resolved, not only initial "handicap" in such testing. Unfortunately, will the MMPI data as it now stands be prema- motivation appears highly correlated with scores ture; it will be incorrect. According to a recent on achievement tests as well as academic perfor- announcement from the Restandardization Com- mance. Hence, what one might be measuring mittee of the University of Minnesota Press with ethnic minorities is not achievement as (1989) concerning the MMPI-2, published in much as motivation. 1989, the revised version will have "national What does exist, as with many other psycho- norms that are much more representative of the metric instruments, is a paucity of data. In chap- present population of the U.S." (p. 4). ter 7 of this handbook, there is a comprehensive Dana (1995) criticizes the sampling methods review of achievement tests. Of the tests dis- of the MMPI. In the MMPI, for example, His- cussed in that chapter, the California Achieve- panics were underrepresented by 2.8 percent. In ment Test (in education) and the Wide Range the second MMPI-2 similar limitations are still Achievement Test (in education and clinical noted. Further, differences are noted in scale per- application) are two of the most frequently used formance between Hispanics and their Anglo- tests, which have been applied to nonmajority Saxon counterparts. Main differences are noted samples of the U.S. population. Initial findings in the following scales; L, K, 3, and 4. He sug- regarding test bias in these measures reflect the gests that clinical interpretation must hinge on conclusions outlined by Fox and Zirkin (1984) in understanding the acculturation of the individual. the first edition of this handbook. Specifically, 544 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

they suggest that while attention should be paid when intelligence is controlled, achievement dif- to the possibility of such bias, and while it may ferences may be due to biological variables in be intuitive that such bias would exist (at least on these ethnic-minority groups. According to the specific items), these tests should not be consid- task force report, "Americans regard the need to ered biased. This conclusion is in direct contrast try harder as evidence of low innate ability and to others, however. For example, Weiss (1987), are less likely to value or encourage such effort" considered the Scholastic Aptitude Test espe- (p. 725). cially biased in the verbal section. While Golden While it is assumed that ethnicity, race, and Rule procedures have been applied to reduce related variables have been explored by the Edu- such biases, the reliability and validity of these cational Testing Service and related psychologi- tests may be in jeopardy (Linn & Drasgow, cal test corporations, again few scientific data 1987). Thus, conflicts exist in terms of having a exist in the public domain regarding tests of apti- useful but unbiased test of achievement. tude. Terman (1916) helped develop the now For tests of interest, even fewer data exist. widely used Stanford Achievement Test for pre- While separate scales for sex are the rule and not college screening with no reference to minority the exception for measures of occupation it is groups. At the college level, the College generally assumed that other variables are of lit- Advanced Placement Examination is also widely tle importance. The same applies for interest sur- used and accepted. However, data on minority veys. For example, the Kuder Occupational populations is still lacking for both of these Interest Survey (Form DD) (Kuder, 1966) as well instruments since relatively few ethnic minori- as the Holland Interest Inventories (1979) con- ties enroll in such programs. sider academic major, occupational status, and even personality type, but not cultural, race, or ethnic factors. The Strong-Campbell is available Culturally Sensitive Measures in Spanish but the norms presumably are from non-Spanish-speaking samples. In a recent study, Traditionally, one method of avoiding test bias Drasgow and Hulin (1987) attempted to answer with regard to culturally different populations is to the question of whether scores on the Job use instruments that are sensitive to and factor out Description Index (a vocational measure) varied cultural variables. Of the attempts to diminish test across different Hispanic populations. Specifi- bias, the most significant effort has been by Cattell cally, they compared bilingual Mexicans in Mex- (1963). His Culture-Fair Intelligence Test mea- ico City to other Hispanics residing elsewhere. sures intellectual abilities that allegedly factor out While few differences were noted between the culture. New York and Miami samples, large differences Cattell's basic aim was to factor out both cul- were noted between the U. S. and Mexican sam- tural and educational variables from intellectual ples. Drasgow and Hulin concluded that both lin- factors. Items were developed on common rather guistic and cultural measurement equivalence than culturally specific knowledge. Based on ini- must be addressed in measures of vocational tial speculation, Cattell suggested that fluid intel- interest. ligence was a function of biological factors What appears to exist is that differences including genetic and constitutional ones. In con- between ethnic minorities and the criterion coun- trast, crystallized intelligence was a result of the terparts exist as early as the first grade (Jackson, development of fluid intelligence through envi- 1975; Task Force of the National Advisory Men- ronmental and cultural opportunities. While the tal Health Council, 1996). Asian students appear Culture-Fair Test has been regularly used in the to perform better than either Anglo-Saxons or United States, its popularity has extended to non- other ethnic minorities. Such differences are even North American populations. To date the instru- greater by the fifth grade. Asian families tend to ment has been used with Nigerian (Nenty, 1986), explain this in motivational differences. In con- Bulgarian (Paspalanova & Shtetinski, 1985), Ital- trast, American mothers consider good perfor- ian (Stepanile, 1982), Spanish (Ortega-Esteban, mance of their children to be related to "natural" Ledesma-Sanz, Lopez-Sanchez, & Prieto- ability. Unfortunately, similar data are not avail- Adanez, 1983), Israeli (Zeidner, 1987), and able on Hispanic and African-American chil- Indian (Ravishankar, 1982) groups. Unfortu- dren. However, it is often thought that, even nately, the test has been shown to exhibit bias in PSYCHOLOGICAL ASSESSMENT OF ETHNIC MINORITIES 545 some (e.g., Nigerian) though not all populations. Standard versions--as well as the Peabody Pic- In addition, these studies were completed with ture Vocabulary Test, the Quick Test, and the individuals residing in their own culture. It Army Beta. However, little evidence exists on would be interesting to explore the efficacy of the ability of these tests to be culture free. With this test with minority cultures residing in the the Picture Vocabulary Test serving as an exam- United States. While this test is promising in the- ple, several of the pictures on this test are useful ory, additional research both in the United States for North American but not British populations. and abroad will have to occur prior to its wider Another interesting example is that of the Luri- clinical acceptance. It is interesting to note that aNebraska Neuropsychological Battery. Certain Cattell who died very recently has been accused sections and stimuli are deemed culture free or as being biased. Indeed, this accusation was dra- culture reduced; but several of the visual stimuli matically brought to the attention of psycholo- come from Denmark and not Nebraska, making gists by the 1997 American Psychological clear identification of specific items (e.g., nut- Association Keynote Speaker, Ellie Weisel. cracker) an often difficult if not impossible task. Unfortunately, the blue-ribbon panel that was Less and less bias research is being done, as convened to explore these allegations was dis- indicated earlier, since the belief that bias does banded before any discussions ensued (partially not exist is so prevalent (Suzuki & Valencia, due to the fact that Cattell removed his name 1997). Hence, one might assume that tests such from further consideration for the award for as the ones developed by Cattell will continue to which he was being considered). lose favor as psychologists continue to, possibly Of all standardized tests, the WAIS has incorrectly, assume that culture is not a critical received most attention with regard to cultural factor in psychological assessment. standardization. Two excellent examples are the Canadian and Puerto Rican versions of the test (Wechsler, 1960). Violato (1984) administered Behavioral Assessment the standard or a revised version of the WAIS to 101 Canadians. The revised version contained In another section of this handbook, chapters eight items that were changed to increase face on behavioral assessment are found. One major validity for Canadians. While bias effects were focus of this type of assessment is the assump- limited, the author did suggest that changes for tion that behavioral, versus psychometric, Canadian administration of the WAIS were nec- approaches to assessment reduce the risk of essary. The WAIS has also been translated and focusing on psychic and nonobservable attribu- standardized with Puerto Rican populations tions. Psychometric focus may increase the (1980). It was assumed that all translations potential for incorrect understanding of the would be appropriate; this assumption, however, behavior in question and, of course, is more is incorrect. Puerto Rican, Chicano, Mexican, likely to introduce bias in the assessment pro- Latin American, South American, and Castilian cess. Spanish not only have their own dialects and Behavioral assessment focuses on empirically idiosyncrasies but in many cases, their own lan- based methods of understanding behavior and, guage. Thus, the Puerto Rican translation of the thus, the application to minority populations WAIS has limited usefulness with non-Puerto seems obvious. If psychometric tests are riddled Rican subjects. Further, though yet to be with questions of culture, race, and ethnicity, researched, the issue of norms needs to be then an assessment procedure, which focuses on addressed. For example, Puerto Rican norms may behavior, and places the individuals in question differ from Argentinean norms. Also, there is the in their environmental context, would seem an question of when an individual, from one culture excellent alternative. Hence, it is surprising to but residing in another, becomes acculturated note that this application has not been consid- enough to be administered the "new" culture's ered and researched adequately. tests. These and related questions remain to be What scientific literature does exist is limited answered. and, at best, preliminary. For example, Slate Other tests of intellectual ability which are (1983) attempted to compare three nonbiased purported to be culture-reduced or fair include "behavioral" measures in retarded and non- Raven's Progressive Matrices--both Colored and retarded children across race and social class. 546 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

Unfortunately, the results are so convoluted that lack of interested personnel is mirrored in fac- they preclude an adequate understanding of any ulty and clinical positions throughout North of these measures. Further, the possibility exists America. that behavioral measures may themselves be Of course, the possibility exists that what is biased, partially due to the rater as well as the actually occurring is what has been previously rating instruments. Lethermun, Williamson, described as "the false uniqueness effect" which Moody, and Wozniaz (1986) examined the is the tendency to overestimate one's personal effects of the race of the rater on the rating of positive attributes and underestimate other's abil- social skills of African-American and white chil- ities (Basic Behavioral Science task force of the dren. The results support earlier findings that the National Advisory Mental Health Council, 1996). race of the child affects the ratings received. In This Task Force reports that American children addition, the researchers reported that racial bias tend to think better of themselves when com- effects were noted with both African-American pared to others, presumably ethnic minorities. As and Caucasian raters. adults, Americans tend to think of themselves as While intuitive support exists for the use of more attractive and intelligent than average. Fur- behavioral assessment with non-mainstream pop- ther, 60 percent of students believed that they ulations, complications are evident in the litera- were in the top 10 percent of ability to get along ture. First, little data and even less clinical with others--clearly an impossibility. One might application of this approach are available. In assume that what has transpired is that Ameri- addition, initial studies suggest that bias may still can psychology has failed to understand the pos- be present both in terms of the rated and the sibility of "false uniqueness effect" and has rater. confused such variables as test intelligence, which are highly influenced by economic and related factors, with biological and genetic supe- SUMMARY riority (see Jensen, 1980 for further informa- tion). It appears that psychologists have decided Understanding human behavior requires an that when an ethnic-minority group does better understanding of human diversity. Unfortu- than the majority group (as in the case for Asian nately, historical foundations have dictated an Americans), the attributing variable is motiva- incorrect understanding of how culture, race, eth- tion and cultural variables (e.g., Suzuki & Valen- nicity, and related demographic variables affect cia, 1997). In contrast, when an ethnic minority human behavior. This situation is evident in the does poorer than a majority group it is often traditional and current use of psychological tests ascribed to biological variables. Differential attri- to measure such variables as intelligence, bution of ethnic-minority differences--if better, achievement, abilities, aptitude, personality, and it must be due to effort; if worse, it must be due neuropsychological function. to genetics--represents intellectual imperialism. Two factors appear to have guided this incor- The obvious outcome is a field lacking in ade- rect measurement of human diversity. First, pio- quate data and much emotionality. The data that neers such as Terman not only suggested that are available are clouded not only by a host of minorities were inferior but that their "prolifera- methodological problems but by researchers' tion should be controlled." Legislation and adju- gross misunderstanding of ethnic-minority group dication addressing minority bias continues to members and membership (including but not lim- this day, especially in California and Texas, even ited to within-group heterogeneity), especially in at the level of the Supreme Court. Indeed, recent the context of majority-group behavior patterns. rulings on affirmative action call these issues into Regardless of the absence of data, social policy question. Second, few researchers, academicians, continues forward--often guided by political but or clinicians have devoted time and effort to not scientific correctness. Thus, much effort answering pertinent questions on human diver- needs to be directed to the areas of research, sity, and even fewer have studied psychological teaching, and services to minority group mem- assessment of diversity. Recently published sta- bers. Until additional adequate information is tistics indicate that fewer ethnic minorities than available, extreme caution should be used in the in earlier years are pursuing graduate training in application of present knowledge of the psycho- psychology or the study of human diversity. The logical assessment of ethnic-minority group PSYCHOLOGICAL ASSESSMENT OF ETHNIC MINORITIES 547 members and in the acceptance of previously Butcher & C. D. Spielberger (Eds.), Advances in considered "universal" theories of human func- personality assessment. Hillsdale, NJ: Erlbaum. tion. Butcher, J. N. (1987). Introduction to the special series: Cultural factors in understanding and assessing psychopathology. Journal of Consult- REFERENCES ing and Clinical Psychology, 55, 459-460. Butchner, J. N., Braswell, L., & Raney, D. (1982). Adams, R. L., Boake, C., & Crain, C. (1982). Bias A cross-cultural comparison of American Indian, in a neuropsychological test classification related black & white inpatients on the MMPI and per- to education, age, and ethnicity. Journal of Con- sisting symptoms. Journal of Consulting and sulting and Clinical Psychology, 50, 143-145. Clinical Psychology, 51, 587-594. Adebimpe, V. R. (1981). Overview: White norms Cattell, R. R. (1963). Theory of fluid and crystal- and psychiatric diagnoses of black patients. lized intelligence: A critical experiment. Jour- American Journal of Psychiatry, 138, 279-285. nal of Educational Psychology, 54, 1-22. Ardila, A., Rosselli, M., & Puente, A. E. (1994). Cole, N. S. (1981). Bias in testing. American Psychol- Neuropsychological evaluation of the Spanish ogist, 36, 1067-1077. speaker. New York: Plenum. Cuellar, I., Arnold, B., & Maldonado, R. (1995). Basic Behavioral Science Task Force of the Acculturation Rating Scale for Mexican Ameri- National Advisory Mental Health Council can II: A revision of the original ASRMA scale. (1996). Basic behavioral science research for Hispanic Journal of Behavioral Science, 17, mental health. Sociocultural and environmental 275-304. processes. American Psychologist, 51, 722-731. Cuellar, I., Harris, L. C., & Jasso, R. (1980). An Beach, F. (1950). The snark was a boojum. American acculturation scale for Mexican American nor- Psychologist, 5, 115-124. mal and clinical population. Hispanic Journal of Bernal, M., & Castro, F. (1994). Are clinical psy- Behavioral Science, 2, 197-217. chologists prepared for service and research with Dahlstrom, W. G., Diehl, L. A., & Lachar, D. ethnic minorities? American Psychologist, 49, (1986). MMPI correlates of the demographic 797-805. characteristics of black and white normal adults. Berry, J. W. (1990). Psychology of acculturation. In In W. G. Dahlstrom, D. Lachar, & L. E. Kahis- J. J. Berman (Ed.), Nebraska symposium of moti- trom (Eds.), MMPI patterns of American minori- vation, 1989: Cross-cultural perspectives, 37, ties (pp. 104-138). Minneapolis: University of 201-234. Minnesota Press. Bersoff, D. N. (1981). Testing and the law. American Dahlstrom, W. G., Lachar, D., & Dahlstrom, L. E. Psychologist, 36, 1047-1056. (Eds.) (1986). MMPI patterns of American Betancourt, H., & Lopez, S. R. (1993). The study of minorities. Minneapolis: University of Minne- culture, ethnicity, and race in American psychol- sota Press. ogy. Americant Psychologist, 48, 629-637. Dahlstrom, W. G., Welsh, G. S., & Dahlstrom, L. E. Boring, E. G. (1950). A history of experimental psy- (1972). An MMPI handbook. Volume I: Clinical chology (2nd ed.). Englewood Cliffs, NJ: Pren- interpretation. Minneapolis: University of Minne- tice-Hall. sota Press. Brislin, R. W. (1988). Increasing awareness of class, Dana, R. H. (1984). Intelligence testing of Ameri- ethnicity, culture, and race by expanding students' can Indian children: Sidesteps in quests of ethi- own experience. In E. S. Cohan (Ed.), The G. cal practice. White Cloud Journal, 3, 35-43. Stanley Hall lecture series (Vol. 8, pp. 137-180). Dana, R. H. (1995). Culturally competent MMPI Washington, DC: American Psychological Asso- assessment of hispanic populations. Hispanic ciation. Journal of Behavioral Sciences, 17, 305-319. Bumam, M. A., Telles, C. A., Kamo, M., & Hough, Denno, D., Meijs, B., Nachshon, I., & Aurand, S. R. L. (1987). Measurement of acculturation in a (1982). Early cognitive functioning: Sex and community population of Mexican Americans. race differences. International Journal of Neuro- Hispanic Journal of Behavior Science, 9, 105. science, 16, 159-172. Butcher, J. N. (1984). Current developments in Drasgow, F. (1972). Biased test items and differential MMPI use: An international perspective. In J. N. validity. Psychological Bulletin, 92, 526-531. 548 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

Drasgow, F., & Hulin, C. L. (1987). Cross-cultural Grubb, N. J. (1987). Intelligence at the low end of the measurement. Revista Interamericana de Psicho- curve: Where are the racial differences? Journal of logia, 21, 1-24. Black Psychology, 14, 25-34. Eells, K. (1951). Intelligence and cultural differ- Guthrie, R. (1976). Even the rat was white. New ences. Chicago: University of Chicago Press. York: Harper & Row. Feuerstein, R., Rand, Y., & Hoffman, M. (1979). Gynther, M. D. (1981). In the MMPI an appropriate The dynamic assessment of retarded performes: device for blacks. Journal of Biack Psychology, 7, The Learning Potential Assessment Device, the- 67-75. ory, instruments, and techniques. Baltimore: Gynther, M. D., & Green, S. B. (1980). Accuracy may University Park Press. make a difference but does a difference make for Flaugher, R. L. (1978). The many definitions of test accuracy? A response to Pritchard and Rosenblatt. Journal of Consulting and Clinical Psychology, bias. American Psychologist, 33, 671-679. 48, 268-272. Fowlers, B. J., & Richardson, F. C. (1996). Why is Hall, Ch.C. (1997). Cultural malpractice. The grow- multicuralism good? American Psychologist, 51, ing obsolescence of Psychology. Americant Psy- 609-621. chologist, 52, 642-651. Fox, C. H., & Zirkin, B. (1984). Achievement tests. In Helms, J. E. (1992). Why is there no study of cul- G. Goldstein & M. Hersen (Eds.), Handbook of tural equivalence in standardized cognitive abil- psychological assessment. New York: Pergamon ity testing? American Psychologist, 47, 1083- Press. 1101. Fradd, S., & Hallman, C. L. (1983). Implications of Herrnstein, R. J., & Murray, C. (1994). The bell psychological and educational research for assess- curve. New York: The Free Press. ment and instruction of culturally and linguisti- Hickman, J. A., & Reynolds, C. R. (1986-1987). cally different students. Learning Disability Are race differences in mental test scores an arti- Quarterly, 6, 468-478. fact of psychometric methods? A test of Harnng- Franco, J. N. (1983). An acculturation scale for Mex- ton's experimental model. Journal of Special ican-American children. Journal of General Psy- Education, 20, 409-430. chology, 108, 175-181. Hiltonsmith, R. W. (1984). Predicting WAIS-R Franzen, M. D. (in press). Reliability and validity in scores from the Revised Beta for low functioning neuropsychological assessment (2nd ed.). New minority group offenders. Journal oyclinical Psy- York: Plenum Press. chology, 40, 1063-1066. Goddard, H. H. (1912). The Kallikakfamily: A study Hoffmann, T., Dana, R., & Bolton, B. (1985). Mea- in the heredity offeeble-mindedness. New York: sured acculturation and MMPI- 168 performance Macmillan. of Native American adults. Journal of Cross- Gonzales, R. R., & Roll, S. (1985). Relationship Cultured Psychology, 16, 243-256. between acculturation, cognitive style, and intelli- Holland, J. L. (I 979). The self-directed search profes- gence: A cross-sectional study. Journal of Cross- sional manual. Palo Alto, CA: Consulting Psy- chologists Press. Cultural Psychology, 16, 190-205. Howard, A., Pion, G. M., Gottfredson, G. D., Flat- Gould, S. J. (1981). The mismeasurement of man. tau, P. E., Oskame, S., Pfafflin, S. M., Bray, D. New York: W. W. Norton & Company. W., & Burstein, A. G. (1986). The changing face Gould, S. J. (1996). The mismeasurement of man. of American psychology: A report from the New York: W. W. Norton & Company. Committee on Employment and Human Graziano, W., Varca, P., & Levy, J. (1982). Race of Resources. American Psychologist, 41, 1311- examiner effects and the validity of intelligence 1327. tests. Review of Educational Research, 52, 469- Humphries, L. G. (1986). An analysis and evalua- 497. tion of test and item bias in the prediction con- Greene, R. L. (1987). Ethnicity and MMPI perfor- text. Journal ofapplied Psychology, 71, 327-333. mance: A review. Journal of Consulting and Clin- Jackson, G. D. (1975). On the report of the Ad Hoc ical Psychology, 55, 497-512. Committee on educational tests with disadvan- Greenfield, P. M. (1997). You can't take it with you. taged students: Another psychological view from Why ability assessment don't cross cultures. the Association of Black Psychologists. Ameri- American Psychologist, 52, 1115-1124. can Psychologist, 30, 88-93. PSYCHOLOGICAL ASSESSMENT OF ETHNIC MINORITIES 549

Jensen, A. (1980). Bias in mental testing. New York: Lopez, S., & Romero, A. (1988). Assessing the Free Press. intellectual functioning of Spanish-speaking Jensen, A. R. (1986). Construct-validity and test adults: Comparison of the EIWA and the WAIS. bias. Phi Delta Kappan, 58, 340-346. Professional Psychology: Research and Prac- Kim, U., & Choi, S-H. (1994). Individualism, col- tice, 19, 263-270. lectivism, and child development: A korean per- McGovern, T. V., Furumoto, L., Halpern, D. F., specitve. In P. M. Greenfield & R. R. Cocking Kimble, G. A., & McKeachie, W. J. (1991). Lib- (Eds.), Cross-cultural roots of minority child eral education, study in depth, and the arts and development (pp. 227-257). Hillsdale, NJ: sciences major-Psychology. American Psycholo- Erlbaum. gist, 46, 598-605. Knight, G. P. & Kagan, S. (1977). Accultura- Magana, J. R., de la Rocha, O., Amsel, J., Magana, tion of prosocial and competitive behaviors H. A., Fernandez, M. I., & Rulnick, S. (1996). among second- and third- generation Mexi- Revisting the dimensions of acculturation: Cul- can-American children. Journal of Cross- tural theory and psychometric practice. His- Cultural Psychology, 8, 273-284. panic Journal of Behavioral Sciences, 18, 444- Kuder, G. F. (1966). General manual: Occupa- 468. tional interest survey form DD. Chicago:Sci- Maheady, L., Towne, R., Algozzine, B., Mercer, J., ence Research Association. & Ysseldyke, J. (1983). Minority overrepresenta- Kuhn, T. S. (1970). The structure of scientific revolu- tion: A case for alternative practices prior to tions (2nd ed.). Chicago: University of Chicago referral. Learning Disability Quarterly, 6, 448- Press. 456. Lachar, D., Dahlstrom, W. G., & Moreland, K. L. Malgady, R. G. (1996). The question of cultural (1986). Relationship of ethnic background and bias in assessment and diagnosis of ethnic other demographic characteristics to MMPI pat- minority clients: Let's reject the null hypothesis. terns in psychiatric samples. In W. G. Kahl- Professional Psychology: Research and Prac- strom, D. Lackar, & L. E. Dahlstrom (Eds.), tice, 27, 73-77. MMPI patterns of American minorities (pp. 139- Marin, G., Sabogal, F., Marin, B., & Otero-Sab- 178). Minneapolis: University of Minnesota ogal, R. (1984). Development of a short accul- Press. turation scale for Hispanics. Hispanic Journal of Lampley, D. A., & Rust, J. V. (1986). Validation of Behavioral Sciences, 9, 183-205. the Kaufman Assessment Battery for Children Montgomery, G. T., & Orozl, S. (1984). Validation of with a sample of preschool children. Psychology a measure of acculturation for Mexican Ameri- in the Schools, 23, 131-137. cans. Hispanic Journal of Behavioral Sciences, 6, Lawson, W. B. (1987). Racial and ethnic factors in 53-63. psychiatric research. Hospital and Community Naglieri, J. A., & Das, J. P. (1996). Das Naglieri Psychiatry, 37, 50-54. Cognitive Assessment System. Chicago: River- Leahey, T. H. (1997). A history of psychology (4th side. ed.). Englewood Cliffs, NJ: Prentice-Hall. Neisser, U., Boodoo, G., Bouchard, T. J., Boykin, Lethermun, V. R., Williamson, D. A., Moody, S. C., A. W., Brody, N., Ceci, S. J., Halpern, D. F., & Wozniak, P. (1986). Racial bias in behavioral Loehlin, J. C., Perloff, R., Sternberg, R. J., & assessment of children's social skills. Journal of Urbina, S. (1996). Intelligence: Knowns and Psychopathology and Behavioral Assessment, 8, unknows. American Psychologist, 51, 77-101. 329-332. Nenty, H. J. (1986). Cross-culture bias analysis of Lindenmann, J. E., & Matarazzo, J. D. (1984). Cattell Culture-Fair Intelligence Test. Perspec- Intellectual assessment of adults. In G. Gold- tives in Psychological Researches, 9, 1-16. stein & M. Hersen (Eds.), Handbook of psycho- Olmedo, E. L. (1981). Testing linguistic minorities. logical assessment (pp. 77-79). New York: American Psychologist, 36, 1078-1085. Pergamon Press. Ortega-Esteban, J., Ledesma-Sanz, A., Lopez- Linn, R. L., & Drasgow, F. (1987). Implications of Sanchez, F., & Prieto-Adanez, G. (1983). Profit of the Golden Rule settlement for test construction. the academically successful student in the Spanish Educational Measurement Issues and Practice, 6, universities. Sciencia Pegagogica Experimentalis, 13-17. 20, 62-82. 550 HANDBOOK OF PSYCHOLOGICAL ASSESSMENT

Padilla, A. M. (1985). Acculturation and stress tion variables. Journal of School Psychology, 25, among immigrants and later generation individ- 323-342. uals. Spanish-Speaking Mental Health Research Roberts, R. J., & Hamsher, K. D. (1984). Effects of Center: Occasional Paper, 20, 11-60. minority status on facial recognition and naming Paspalanova, E., & Shtetinski, D. (1985). Standard- performance. Journal of Clinical Psychology, 40, ization of the CF 2A Intelligence Test of Cattel for 539-545. Bulgarian Population. Psikhologiia Bulgaria, Ross-Reynolds, J., & Reschly, D. J. (1983). An 1985, 12--22 (translation). investigation of the item bias on the WISC-R Perez-Arce, P., & Puente, A. E. (1996). Neuropsy- with four sociocultural groups. Journal of Con- chological assessment of ethnic minorities: The sulting and Clinical Psychology, 51, 144-146. case of assessing Hispanics living in North Amer- Sandoval, J. (1979). The WISC-R and internal evi- ica. In R. Sbordone & C. Long (Eds.) Ecological dence of test bias with minority group. Journal of validity in neuropsychological testing. Delray Consulting and Clinical Psychology, 47, 919-927. Beach, FL: St. Lucie Press. Sandoval, J., & Miille, M. P. W. (1980). Accuracy of Phinney, J. S. (1996). When we talk about american judgments of WISC-R item difficulty for minority ethnic groups, what do we mean? American Psy- groups. Journal of Consulting and Clinical Psy- chologist, 51, 918-927. chology, 48, 249-253. Prasse, D. (1979). Federal legislation and school psy- Scarr, S. (1988). Race and gender as psychological chology: Impact and implication. Professional variables: Social and ethical issues. American Psy- Psychology, 9, 592-601. chologist, 43, 56-60. Puente, A. E. (1993). Towards a psychology of Schultz, D. P., & Schultz, S. E. (1996). A history of modem psychology (6th ed.). New York: Aca- variance. In T. McGovern (Ed.), Enhancing demic Press. the undergrauduate psychology curriculum. Slate, N. (1983). Nonbiased assessment of adaptive Washington, DC: American Psychological behavior: Comparison of three instruments. Association. Exceptional Children, 50, 67-70. Puente, A. E., & Salazar, G. (1998). Assessment of Spitzer, R. L., Endicott, J., & Fleiss, W. (1967). minority and culturally diverse children. In A. Pri- Instruments and recording forms for evaluating fitera & D. Saklofske (Eds.), WISC-III. Clinical psychiatric status and history: Rationale, method use and interpretatio. New York: Academic Press. of development and description. Comprehensive President's Commission on Mental Health (1978). A Psychiatty, 8, 321-343. report to the President from the President's Spitzer, R. L., & Williams, J. B. W. (1980). Classifi- Commission on Mental Health. Washington, DC: cation of mental disorders and DSM-111. In H. I. U.S. Government Printing Office. Kaplan, A. M. Freedman, & B. J. Sadock (Eds.), Ravishankar, V. (1982). A correlational study of Cat- Comprehensive textbook of psychiatry-Ill. Balti- tell's personality factor B. and 1. Q. as measured more: William and Wilkins. by his culture free test. Indian Psychological Sternberg, R. J. (1996). Successful intelligence: Review, 22, 9-1 1. How practical and creative intelligence deter- Reschly, D. J. (1984). Aptitude tests. In G. Goldstein mine success in life. New York: Simon & & M. Hersen (Eds.), Handbook of psychological Schuster. assessment. New York: Pergamon Press. Sternberg, R. J. (1997). The concept of intelligence Restandardization Committee of the University of and its role in lifelong learning and success. Amer- Minnesota Press. (1989). MMPI-2 Minnesota ican Psychologist, 52, 1030-1037. Multiphasic Personality Inventory. Critical Items, Sternberg, R. J., & Williams, W. M. (1997). Does the 4 (2), 1-4. Graduate Record Examination predict meaningful Reynolds, C. R., & Brown, R. T. (1984). Bias in men- success in the graduate training of psychologists? tal testing. In C. R. Reynolds & R. T. Brown American Psychologist, 52, 630-641. (Eds.), Perspectives on bias in mental testing. Stepanile, C. (1982). Contributo per una taratura New York: Plenum Press. intaliana de test culture fair di Cattell. Bollettino & Reynolds, C. R., Chastain, R. L. Kaufman, A. S., & Applicata (161-164), 81-86 (translation). McLean, J. E. (1987). Demographic characteristic Sue, S., & Zane, N. (1987). The role of culture and of IQ among adults: Analysis of the WAIS-R stan- cultural techniques in psychotherapy. American dardization sample as a function of the stratifica- Psychologist, 42, 37-45. PSYCHOLOGICAL ASSESSMENT OF ETHNIC MINORITIES 551

Suinn, R. M., Richard-Figueroa, K., Len, S., & Vigil, Weschler, D. A. (1960). Escala de Inteligencia de P. (1987). The Suinn-Law Asian Self Identity Weschler-Adultos. New York: Psychological Cor- Acculturation Scale: An initial report. Educational poration. and Psychological Measurement, 6, 103-112. Westermeyer, J. (1987a). Cultural factors in clinical Suzuki, L. A., & Valencia, R. R. (1997). Race-ethnic- assessment. Journal of Consulting and Clinical ity and measured intelligence. Americant Psychol- Psychology, 55, 471-478. ogist, 52, 1103-1114. Westermeyer, J. (1987b). Clinical considerations in Terman, L. W. (1916). The measurement of intelli- cross-cultural diagnosis. Hospital and Community gence. Boston: Houghton-Mifflin Company. Psychiatry, 38, 160-165. Triandis, H. C. (1982). Acculturation and bicultura- Whitworth, R. H., & Gibbons, R. T. (1986). Cross- tion indices among relatively accultured Hispanic racial comparison of the WAIS and WAIS-R. youths. Revista Interamericana de Psicologia, 16, Educational and Psychological Measurement, 46, 140-149. 1041-1049. U.S. Bureau of the Census (1992). Survey of Busi- Williams, C. L. (1987). Issues surrounding psycho- ness Owners and Self-Employed Pearsons: Form logical testing of minority patients. Hospital and MB-1. Washington, D.C.: Department of Com- Community Psychiatry, 38, 184-189. merce. World Health Organization (1973). International Velazquez, R. J., Gonzales, M., Butcher, J. N., pilot project on schizophrenia. Geneva, Switzerp- Castillo-Canez, I., Apocada, J. X., & Chavira, D. land: Author. (1997). Use of the MMPI-2 with Chicanos: Yerkes, R. N. (1923, March). Testing the human Strategies for counselors. Journal of Multicul- mind. Atlantic Monthly, pp. 358-370. tural Counseling and Development, 25, 107-120. Zeidner, M. (1987). Test of the cultural bias hypothe- Violato, C. (1984). The effects of Canadianization of sis: Some Israeli findings. Journal of Applied Psy- American-biased items on the WAIS and WAIS-R chology, 72, 38-48. information subtests. Canadian Journal of Behav- Zubin, J. (1965). Cross-national study of diagnosis ioral Science, 16, 36-41. of the mental disorder: Methodology and plan- Wechsler, D. (1991). Wechsler Intelligence Scale for ning. American Journal of Psychiatry, 125, 12- Children-Third Edition: Manual. San Antonio: 20. The Psychological Corporation. Zuckerman, M. (1990). Some dubious premises in Weiss, J. (1987). The Golden Rule bias reduction research and theory on racial differences: Scien- principle: A practical reform. Educational Mea- tific, social, and ethical issues. American Psychol- surement Issues and Practice, 6, 23-25. ogist, 45, 1297-1303.