<<

A A ACHIEVEMENT MOTIVATION

THE IMPORTANCE OF ACHIEVEMENT part of the motivation process responsible for MOTIVATION the translation of motivation into action is often called the volitional phase in the control Human life can be described as a continuous work of behaviour (Heckhausen, 1989); during this at tasks. Individuals may or may not be successful phase, goal-oriented action turns into outcomes in facing these tasks. The of achieve- controlled by the degree of goal commitment. ment motivation is engaged to run research projects Goal commitment affects the way persons choose aiming at a better understanding of individual to reach their goals and the selection of strategies performance and the nature of human resources they pursue (Brandtsta¨dter & Renner, 1990). as well as at the development of assessment and Examples for such strategies are to pursue intervention techniques to increase achievement a goal persistently even in case of hindrances or motivation. Tasks in industrial settings and in to adapt flexibly to changing aspects of the service organizations become more and more situation. The translation process works better complex and underlie dynamic changes arising when more specific and concrete goals are set, from changing market demands. To keep indivi- the higher the goal commitment the more duals highly achievement motivated while doing effective the chosen strategies of goal pursuit their jobs, tasks have to be designed with high (Vroom, 1964; Locke & Latham, 1990; Kleinbeck, motivating potentials. in press). From a motivational perspective the action A goal-oriented course of action immune to process is divided into two parts. The first part disturbances is especially supported by specific describes the development of achievement motiva- and concrete goals (goal characteristics; tion as a consequence of a fit between the Figure 1). achievement motive and the achievement-oriented Because of the many single concepts subsumed motivating potentials of the situation. Achievement under the label of achievement motivation, it is motivation initiating action arises through interac- necessary to develop as many measurement tools tion of achievement-oriented motivating potentials as possible to differentiate between the concepts. of the task in its situational context and the strength Outside current research projects, measures of of the achievement motive on the side of the achievement motivation are principally used in performing person. Personal goals controlling industrial setting, in service organizations and in actions result directly from the strength of this educational fields. Here achievement motivation achievement motivation (Figure 1). The second measurement is used to investigate the motivating

[8.8.2002–12:29pm] [1–128] [Page No. 1] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 2 Achievement Motivation Motivational process

Achievement to set goals motivation Achievement oriented to set Achievement motive motivating potential goals action (volitional pr Motivational process to Achievement to translate goals into motivation Goal commitment Goal characteristics to translate Strategies in pursuing goals into goals actions ocess)

Outcomes

Figure 1. Components of achievement motivation.

potentials of work tasks and work contexts to TAT with pictorial presentations of situations it make full use of individual resources. becomes possible to penetrate implicitly into the achievement motive system, because this kind of measurement allows one to approach materials of INSTRUMENTS TO ASSESS memory relevant for the motive system. Filling ACHIEVEMENT MOTIVATION out questionnaires requires ego involvement, self- insight and self-reflection, and also explicit The theory of achievement motivation describes memory, because the answers to the questions can performance as multidimensional and as influ- only be given with the help of conscious reflec- enced by many different factors. The main tion to earlier experiences (Graf & Schacter, personal factor is the achievement motive; the 1985: 501). main task-specific factor is the motivating Schmalt & Sokolowski (2000) discuss the potential of the situation. For diagnostic informa- quality of the different techniques to measure tion about mode and strength of the achievement the achievement motive and conclude that all motive there are three different sources (see available instruments work reliably. TAT and the Schneider & Schmalt, 2000: 50–56): grid technique have comparable and widely diversified validity ranges that are related to 1 Self-judgement respondent and operant behaviour. Question- 2 Judgement by others naires used to diagnose motives seem to be 3 Behavioural indices specialized to predict respondent behaviour and Assessing the strength of the achievement motive conscious experiences (Spangler, 1992). different strategies are used according to these To measure the achievement-oriented motivating sources: operant procedures (e.g. the Thematic potentials of tasks, Hackman & Oldham (1965) Apperception Test – TAT) and respondent proce- developed and presented an instrument, the Job dures (e.g. questionnaires), and the grid technique Diagnostic Survey (JDS), that has well proven its that according to Schmalt (1999) lies in its validity (Fried & Ferris, 1987). The JDS measures methodological background between the first two the motivating potentials of tasks in work situa- types of measurement. Due to this fact, one can tions and also of tasks that students are confronted differentiate implicit and explicit components of with in learning situations (Schmidt & Kleinbeck, the achievement motive. Using the material of the 1999). Measuring the achievement motive and the

[8.8.2002–12:29pm] [1–128] [Page No. 2] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Achievement Motivation 3

motivating potentials of tasks allows one to achievement-oriented process of translating goals determine the strength of achievement motivation. into action, Locke & Latham (1984, 1990) Rheinberg, Vollmeyer & Burns (in press) present present a questionnaire that has been used mainly an instrument to measure achievement motivation in research settings. Other questionnaires try to as a comprehensive construct. With 18 items, four measure clarity of tasks and goals (Sawyer, components of the current state of achievement 1992), clarity of methods (Breaugh & Colihan, motivation are measured: (1) fear of failure; (2) 1994; Schmidt & Hollmann, 1998) and also probability of success; (3) interest; (4) challenge. In clarity of performance judgements (Breaugh & its German and English version, the instrument Colihan, 1994; Kleinbeck & Fuhrmann, 2000). shows satisfying consistencies and according to the These components of achievement motivation first validation data, the measured components of measured by the mentioned questionnaires affect current achievement motivation correlate posi- the motivation to translate goals into action and tively with learning behaviour and performance. as a consequence performance outcome. Schuler & Prochaska (2001) define achievement Recently researchers began to measure goal motivation as a general behavioural orientation. commitment (Hollenbeck et al., 1989). They The instrument they developed – the Hohenheim invested considerable effort because goal setting Test of Achievement Motivation (HTML) – allows is no homogeneous construct. As Tubbs (1993) measuring achievement motivation with 17 scales could show there are three different components in a highly differentiated way. The results of the of goal commitment: the first component has to HTML measures correlate significantly with neuro- do with processes of weighing and evaluating the ticism and conscience in the five-factor model of potential goals. During these processes, one personality (Costa & McCrae, 1980). Measures in calculates mainly values and expectancies that HTML are positively related to success at school, affect the strength of motivational tendencies for university and work so that one can expect a specific goals. The second component contains successful application in personality research and the result of these evaluative processes focussing in educational and occupational testing. on calculations of values and expectancies and To measure goal characteristics (e.g. goal leading to setting a personal goal. This com- specificity and goal difficulty) that influence the ponent is also related to the decision to attain this

Table 1. Instruments for measuring components of achievement motivation Instruments Author Concepts measured Method used TAT Murray, 1943; Achievement motive of stories McClelland et al. 1953 and other motives (operant) OMT Kuhl, 2000 Achievement motive Content analysis of written stories (operant) MARPS Mehrabian, 1968 Achievement motive Questionnaire (respondent) Grid-technique Schmalt & Sokolowski, Achievement motive Judgement of fit between 1999 pictures and motive- related statements Questionnaire Rheinberg et al., 2000 Current motivation for Questionnaire (respondent) for current learning and performance motivational states AVEM Schaarschmidt & Current work motivation Questionnaire (respondent) Fischer, 1996 HLMT Schuler & Prochaska, 2001 Achievement motivation Questionnaire (respondent) JDS Hackman & Oldham, 1965 Motivating potential Questionnaire (self-judgement of tasks and judgement by others) Fragebogen fu¨ r Locke & Latham, 1990 Goal specificity and others Questionnaire (respondent) Zielcharakteristika Goal commitment Hollenbeck et al., 1987 Goal commitment Questionnaire (respondent) Strategies of Brandtsta¨dter & Strategies of goal pursuit Questionnaire (respondent) goal pursuit Renner, 1990

[8.8.2002–12:29pm] [1–128] [Page No. 3] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 4 Achievement Motivation

particular goal. The third component of goal reliably, validly and practically. The existing commitment is characterized by maintaining the instruments can be used in research and practical set goal and by staying persistent even when settings. faced with hindrances. Future research will show whether it will be possible to develop differ- References entiated measurement procedures on the basis of these considerations. Brandtsta¨dter, J. & Renner, G. (1990). Tenacious goal With respect to goal commitment in goal- pursuit and flexible goal adjustment: explications oriented action, people seem to be able to use and age-related analysis of assimilative and accom- stable dispositions. They either persist tenaciously modative strategies of coping. Psychology and in pursuing their goals or they adjust flexibly to Aging, 5, 58–67. Breaugh, J.A. & Colihan, J.P. (1994). Measuring facets new or other goals. Brandtsta¨dter & Renner (1990) of job ambiguity: construct validity evidence. described two scales to measure ‘tenacious goal Journal of Applied Psychology, 79, 191–202. pursuit’ and ‘flexible goal adjustment’. Their Costa, P.T. & McCrae, R.R. (1989). The NEO PI/FFI results show relations between these different Manual Supplement. Odessa, FL: Psychological strategies and age. Older people adapt more often Assessment resources. Fried, Y. & Ferris G. (1987). The validity of the job flexibly instead of pursuing their goals tenaciously characteristics model: a review and meta analysis. against hindrances. Table 1 summarizes the Personnel Psychology, 40, 287–322. instruments for measuring components of achieve- Graf, P. & Schacter, D.L. (1985). Implicit and explicit ment motivation. memory for new associations in normal and amnesic subjects. Journal of : Learning, Memory, and Cognition, 11, 501–518. Hackman, J.R. & Oldham, G.R. (1975). Development FUTURE PERSPECTIVES of the Job Diagnostic Survey. Journal of Applied Psychology, 60, 159–170. The current state of research can be described Heckhausen, H. (1989). Motivation und Handeln. as presenting a set of different measurement : Springer. Hollenbeck, J.R., Klein, H.J., O’Leary, A.M. & approaches for the central components of Wright, P.M. (1989). Investigation of the construct achievement motivation. Future tasks for research validity of a self-report measure of goal commit- and applications mainly in work and educational ment. Journal of Applied Psychology, 74, 951–956. settings will be to determine the range of validity Kleinbeck, U. (in press). Das Management von for the different measures more exactly. This can Arbeitsgruppen. In Schuler, H. (Ed.), Lehrbuch der Personalpsychologie. help to decide under what circumstances specific Kleinbeck, U. & Fuhrmann, H. (2000). Effects of a instruments can be used profitably. Although psychologically based management system on work there are now some reliable and valid instruments motivation and productivity. Applied Psychology: to measure single components of achievement An International Review, 49, 596–610. Kleinbeck, U. & Schmidt, K.-H. (in press). Gruppen- motivation, it would be helpful to have new leistung und Leistungsfo¨ rderung. In Schuler, H. instruments and procedures to relate them to (Ed.), Organisationspsychologie, Enzyklopa¨die der each other. Psychologie.Go¨ ttingen: Hogrefe. Kuhl, J. & Scheffer, D. (2000). Auswertungsmanual fu¨ r den Operanten Multi-Motiv-Test (OMT). Osnabru¨ ck, unvero¨ ff. Manuskript. CONCLUSIONS Locke, E.A. (1990). A Theory of Goal Setting and Task Performance. Englewood Cliffs, NJ: Prentice A high achievement motivation in people Hall. guarantees success and wealth in human Locke, E.A. & Latham, G.P. (1984). Goal-Setting: a societies. To produce adequate conditions for the Motivational Technique that Works. Englewood Cliffs, NJ: Prentice Hall. development of a high achievement motivation it McClelland, D.C., Atkinson, J.W., Clark, R.A. & is necessary to understand how achievement Lowell, E.L. (1953). The Achievement Motive. motivation is formed and how it can be translated New York: Appleton-Century-Crofts. into successful action. In accordance with the Mehrabian, A. (1968). Male and female scales of the tendency to achieve. Educational and Psychological importance of this kind of motivation, a series of Measurement, 28, 493–502. instruments have been designed to measure the Murray, H.A. (1943). Thematic Apperception Test different components of achievement motivation Manual. Cambridge: Harvard University Press.

[8.8.2002–12:29pm] [1–128] [Page No. 4] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Achievement Testing 5

Rheinberg, F., Vollmeyer, R. & Burns, B.D. (in press). Organisationspsychologie, Enzyklopa¨die der Psycho- FAM: Ein Fragebogen zur Erfassung aktueller logie. Go¨ ttingen: Hogrefe. Motivation in Lern- und Leistungssituationen. Schuler, H. & Prochaska, M. (in press). Entwicklung Sawyer, J. E. (1992). Goal and process clarity: und Konstruktvalidierung eines berufsbezogenen specification of multiple constructs of role ambiguity Leistungsmotivationstests. and a structural equation model of their antecedents Spangler, W.D. (1992). Validity of questionnaire and consequences. Journal of Applied Psychology, and TAT measures of need for achievement: two 77, 130–142. meta-analyses. Psychological Bulletin, 112, Schaarschmidt, U. & Fischer, A. (1996) AVEM – 140–154. Arbeitsbezogenes Verhaltens- und Erlebensmuster Tubbs, M.E. (1993). Commitment as a moderator (Manual). Frankfurt am Main: Swets Testservices. of the goal-performance relation: a case of clearer Schmalt, H.-D. (1999). Assessing the achievement construct definition. Journal of Applied Psychology, motive using the grid technique. Journal of Research 78, 86–97. in Personality, 33, 109–130. Vroom, V.H. (1964). Work and Motivation. New York: Schmalt, H.-D. & Schneider, K. (2000). Motivation. Wiley. Stuttgart: Kohlhammer. Schmalt, H.-D. & Sokolowski, K. (2000). Zum Uwe Kleinbeck gegenwa¨rtigen Stand der Motivdiagnostik. Diagnos- tica, 46, 115–123. Schmidt, K.-H. & Hollmann, S. (1998). Eine deutsch- sprachige Skala zur Messung verschiedener Ambigui- ta¨tsfacetten bei der Arbeit. Diagnostica, 44, 21–29. Schmidt, K.-H. & Kleinbeck, U. (1999). Job Diagnostic Related Entries Survey (JDS – deutsche Fassung). In: Dunckel, H. (Ed.), Handbuch psychologischer Arbeitsanalysever- Applied Fields: Organizations, Applied Fields: fahren (pp. 205–230). Zu¨ rich: vdf. Work and Industry, Personnel Selection, Lea- Schmidt, K.-H. & Kleinbeck, U. (in press). Leistung dership in Organizational Settings, Leadership und Leistungsfo¨ rderung. In Schuler, H. (Ed.), Personality, Motivation

A ACHIEVEMENT TESTING

INTRODUCTION achievement tests include tests designed by teachers for use in the classroom and standar- Achievement testing plays a central role in dized tests developed by school districts, states, education, particularly given the current context national and international organizations, and of high-stakes educational reform seen in countries commercial test publishers. like the United States. This entry provides a brief Achievement tests have been used for: (a) overview of achievement testing beginning with a summative purposes such as measuring student description of its role in education. Different types achievement, assigning grades, grade promotion of achievement tests, commonly used derived and evaluation of competency, comparing student scores, recent advances such as performance achievement across states and nations, and assessments, and future directions are described. evaluating the effectiveness of teachers, pro- grammes, districts, and states in accountability programmes; (b) formative purposes such as ACHIEVEMENT TESTING AND ITS identifying student strengths and weaknesses, ROLE motivating students, teachers, and administrators to seek higher levels of performance, and Achievement tests are designed to measure the informing educational policy; and (c) placement knowledge and skills that individuals learn in and diagnostic purposes such as selecting and a relatively well-defined area through formal placing students, and diagnosing learning dis- or informal educational experiences. Thus, abilities, giftedness, and other special needs.

[8.8.2002–12:29pm] [1–128] [Page No. 5] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 6 Achievement Testing

The most controversial uses of achievement per subject area so each can be explored testing have been in high-stakes accountability in depth. Examples of these tests include the programmes and minimum competency testing WIAT – Comprehensive Test, Woodcock–Johnson (MCT). Accountability practices vary and Complete Battery III, Gates–McKillop–Horowitz may include financial rewards for improved Reading Diagnostic Test, Comprehensive Tests of performance to providing remediation for Basic Skills, and Terra Nova. students who perform poorly to sanctions such as public hearings, staff dismissals, and dissolu- Breadth tion of districts. Two negative consequences that have been associated with high-stakes account- Single-subject tests include a number of subtests ability include a pattern of inflated achievement ranging from lower to higher skill levels to assess results as highlighted by Cannell’s (1988) finding different aspects of a subject area. Single-subject that all states were reporting that their students tests include the Woodcock Reading Mastery were scoring above the national norm (Lake Tests – Revised and KeyMath – Revised. Multiple- Wobegon effect), and the narrowing of instruc- subject tests assess at least the three commonly tion or ‘teaching to the test’ so that student scores taught subject areas of reading, mathematics, and compare favourably to norms. written language. Such tests include the Iowa Tests MCT programmes were implemented in of Basic Skills, California Achievement Test, SRA response to concerns about high levels of Achievement Series, Stanford Achievement Test illiteracy and innumeracy and subsequent poor Series, and Tests of Achievement and Proficiency. ‘work force readiness’ among high school graduates. In addition to course completion Administration requirements, such programmes require students to pass tests of minimal basic skills (usually in Group administered achievement tests are usually reading, writing, and arithmetic) to graduate multiple-subject tests that contain comparable from high school. Legal cases such Debra P. vs. subtests for students in different grades. These Turlington raised questions about what constitu- tests usually are administered within the classroom tes minimum competency, whether the skills and are used throughout school districts or states. assessed are reflected in school curriculum, and Examples include the Iowa Tests of Basic Skills, whether students have been given adequate Metropolitan Achievement Test 8, Iowa Tests of opportunity to learn the skills required. Educational Development, Gray Oral Reading Test – 3, and Sequential Tests of Educational Progress – III. Individually administered achieve- ment tests may include single- or multiple-subject STANDARDIZED ACHIEVEMENT tests and typically are administered in clinical and TESTS educational settings. Such tests include the Kaufman Test of Educational Achievement, Wide Standardized tests may be classified using the Range Achievement Test – III, Gates–MacGinitie overlapping categories of purpose, breadth, Reading Test, and Peabody Individual administration, item format, and interpretation. Achievement Test – Revised.

Purpose Item Format Screening tests tend to be relatively brief with only Fixed-response items include multiple-choice, true– one subtest covering each subject area. These false, matching, and stem completion items. A key tests are useful in determining if more expensive advantage of fixed-response items is that consider- comprehensive testing is warranted. Screening tests able material can be covered in a relatively short include the Wechsler Individual Achievement Test period of time. Criticisms of these items are that (WIAT) – Screener, Wide Range Achievement they emphasize recall of facts over higher order Test – 3, and Basic Achievement Skills Individual thinking and problem-solving, they are susceptible Screener (BASIS). Comprehensive or diagnostic to guessing and testwiseness, and they discourage tests typically include more than one subtest creative thinking. They also tend to be difficult

[8.8.2002–12:29pm] [1–128] [Page No. 6] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Achievement Testing 7

items to prepare. Nonetheless, multiple-choice (‘developmental scores’) reflect average perfor- items are the most common item format used in mance at different age and grade levels. standardized achievement tests. These scores are often (a) misinterpreted when Constructed items include short answer and individual performance is compared to the wrong essay responses. The advantages of constructed reference group, and (b) inappropriately used as items are that they require students to construct standards of performance when teachers and a response rather than simply recognize the correct parents expect all students in a particular age- or answer, they assess students’ ability to organize, grade to achieve age- or grade-equivalent scores. connect ideas, and problem-solve, they reduce the Standard scores provide an indication of a impact of guessing, and preparation of questions student’s relative performance on a test in terms is relatively quick and easy. Disadvantages of of how far his/her score is from the mean in constructed items are that relatively few questions standard deviation units. Common types of can be asked and thus adequate coverage of the standard scores are z-scores, T-scores, deviation subject area may not occur, they are susceptible to IQ scores, and stanines. Standard scores are the bluffing, and scoring is time-consuming, requires most highly recommended derived scores. considerable subjective judgement, and is less Percentiles (percentile ranks) indicate the point reliable than scoring of fixed-response items. in a distribution at or below which the scores of a given percentage of students fall and should not Interpretation be confused with percentages or percent correct. Percentiles are the most easily interpreted derived When achievement test results are interpreted with scores. However, percentiles do not represent reference to a normative group, the test is referred to equal intervals across the distribution, which as a norm-referenced test (NRT). Students’ NRT means that they magnify small differences near scores usually are expressed in age- or grade- the mean and minimize large differences in the equivalent scores, standard scores, or percentiles. upper and lower ends of the distribution. NRTs are designed to discriminate among students’ performance; they do not provide information on the amount of information learned. Most of the RECENT ADVANCES IN tests discussed already are NRTs. When test results ACHIEVEMENT TESTING are interpreted in terms of whether each student has mastered specific knowledge and skills without Computer Adaptive Testing reference to other students or a normative group, Computer adaptive testing (CAT) attempts to the test is said to be criterion-referenced (CRT). match the difficulty of test items to the knowledge Students’ CRT scores are usually expressed as and skill level of the student being assessed by percent correct or by descriptors such as mastery/ tailoring the test so that a pre-selected sequence non-mastery. Most CRTs are developed by schools of items is administered based on whether or not or states. Examples are the Basic Skills Assessment the response to the previous item is correct. The Program, Kentucky Instructional Results Informa- advantages of CAT over traditional achievement tion System, and Louisiana Educational Assess- tests include reduced testing time, the need for ment Program (LEAP 21). Some NRT tests also fewer items at a given level of measurement error, provide CRT interpretations such as BASIS. minimized frustration for students who perform poorly, and more precise estimates of achievement DERIVED SCORES ASSOCIATED across the entire distribution. WITH ACHIEVEMENT TESTS Large-Scale Assessments Raw scores obtained on achievement tests typically are converted to derived scores, so we Large-scale assessments are conducted by the can make comparisons among test scores. district, state, or nation(s) to examine the Commonly found derived scores include age or educational achievement of groups. The best- grade equivalent scores, standard scores, and known large-scale assessments today are the percentile scores. Age or grade equivalent scores National Assessment of Educational Progress

[8.8.2002–12:29pm] [1–128] [Page No. 7] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 8 Achievement Testing

(NAEP) in the United States and the surveys aligned with these standards, and accountability conducted by the International Association for measures. Content standards define what a student the Evaluation of Educational Achievement (IEA). should know and be able to do and thus drive The purpose of NAEP, which was first intro- curriculum. Performance standards define how duced in 1969, was ‘to improve the effectiveness much a student should know and be able to do, and of our Nation’s schools by making objective thus set the benchmarks or expected levels of achi- information about student performance in evement to be used for accountability. Standards- selected learning areas available to policymakers based assessments (also known as standards- at the national, State and local levels’ (Public Law referenced testing) are based on content and 100-297, Section 3401). The IEA has conducted performance standards, involve multiple measures numerous international achievement surveys since of student performance, and apply to all students. its first cross-national survey in 1959 and is best A critical aspect of such assessments is to produce known for the longitudinal Third International and use ‘better tests’ such as performance assess- Mathematics and Science Study (TIMSS) first ments. Accountability measures focus on strength- conducted in 1995. ening standards-based reform initiatives by rewarding teachers and schools whose students Performance Assessment meet performance standards and sanctioning those who do not. Increasing attention has been paid to performance assessments (also known as authentic or alternative assessments), which consist of students’ con- CONCLUSIONS AND FUTURE structed responses to ‘real world’ (authentic) tasks PERSPECTIVES and problems and the cognitive skills and processes involved in the construction of those responses. Current and future advances in achievement testing Examples of performance assessments include appear to be focussed on the development, portfolios of students’ work over time, poetry, improvement, and evaluation of standards-based science experiments, conversations in a foreign and performance assessments. Five areas for future language, and open-ended mathematics problems. development include: (1) best practices for devel- The students’ work is judged using an agreed-upon oping, and methods for evaluating, performance set of criteria. The advantages of these assessments assessment scoring rubrics, (2) comparisons of the are that they are meant to measure processes various types of data to be used in accountability involved in the acquisition of knowledge and skills models such as mean scores, value-added data, and in ways that make the link between learning and residual scores adjusted for socio-economic status, instruction clearer. Disadvantages are that fewer (3) longitudinal research examining the impact of tasks can be included given time constraints, performance and standards-based assessments on creating agreed-upon criteria for scoring is time- student achievement, instructional practices, and consuming, and judgement of students’ work is student learning, (4) comparisons of traditional highly subjective, all of which make performance standardized testing (including multiple-choice assessment expensive and open to bias. formats) and performance assessments as measures of student achievement, and (5) exploration of Standards-Based Assessments computer-based, and notably Internet, delivery and scoring of performance assessments for large scale Standards-based reform describes efforts to assessment. improve education for all students through the setting of high standards. Its beginnings rest with the 1983 National Commission on Excellence in References Education report, A Nation at Risk: The Imperative for Educational Reform, and it has Anastasi, A. & Urbina, S. (1997). Psychological culminated in the passing of ‘Goals 2000: Educate Testing. Upper Saddle River, NJ: Prentice-Hall, Inc. Cannell, J.J. (1988). Nationally normed elementary America Act’ by the U.S. Congress in 1994. achievement testing in America’s public schools: how Standards-based approaches include content and all 50 states are above the national average. performance standards, assessments that are Educational Measurement: Issues and Practice, 7, 5–9.

[8.8.2002–12:29pm] [1–128] [Page No. 8] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Adaptive and Tailored Testing 9

Hambleton, R.K. & Zaal, J.N. (1991). Advances in Related Entries Educational and : Theory and Applications. Boston, MA: Kluwer Academic Publishers. Applied Fields: Education, Achievement Testing, National Commission on Excellence in Education Criterion-References Testing: Models and Pro- (1983, April). A Nation at Risk: The Imperative cedures, Non-References Testing: Methods and for Educational Reform (Doc. 20402). Washington, Procedures, Item Response Theory DC: U.S. Government Printing Office.

Anita M. Hubley

ADAPTIVE AND TAILORED TESTING (INCLUDING IRT A AND NON-IRT APPLICATION)

INTRODUCTION efficient measurements are obtained. It needs fewer items (sometimes, less than half) than In computerized adaptive testing (CAT), a com- conventional tests to achieve the same level of puter administers the items and gathers the precision as a full-length test. The elements that examinee’s responses, but its most distinctive make up a CAT are: an item pool with known feature is that the items finally administered properties, a heuristic to choose the items, a depend on the examinee’s ability. The test then method to evaluate ability and a criterion to end adapts to the examinee’s performance on the items. the application. Though they are all important, The idea of adaptive measurement can be traced the efficiency of a CAT mostly depends on two back to Binet, but it never became a reality until the closely related complementary processes: the appearance of the item response theory (IRT) and statistical method of estimating ability and the development of the computer. However, the criterion for item selection. This explains the adaptive testing is also possible without IRT, as great amount of procedures known and why they will be seen later. The first ideas on CAT appeared are two of the most studied aspects of CAT. in the early 1970s (Lord, 1970). CAT has spent in the laboratory the greater part of the elapsed time since then, because the main concern of the ITEM BANK researchers has been to obtain the most efficient, precise and possible strategies for item selection. A CAT chooses items from a database (item They have become operational only in the bank) containing the available items and various last decade. Computerized adaptive tests were information about each item, such as its stem, administered to more than a million people in 1999 correct and incorrect options, item parameter (Wainer, 2000). Its main applications are to the estimates under an appropriate IRT model, areas of personnel selection, educational assess- classical item difficulty and discrimination ment, certification and licensure. Due to its indices, information on the specific domain the practical applications, new concerns such as test item measures, the proportion of times the item security, profitability and social impact have arisen. has been administered, etc. The bank has to be calibrated and its unidimensionality and accep- table fitting to an IRT model should be checked BASIC PRINCIPLES and accepted. Item banking and IRT are specific entries in this encyclopedia, and further details The basic principles of a CAT are well can be found there. established. Its aim is to apply to each examinee A CAT does not need a specific item format. only the items that best serve to assess his/her A CAT may be developed both for dichotomous level of ability. Its main advantage is that more and polytomous items, and for multiple-choice or

[8.8.2002–12:29pm] [1–128] [Page No. 9] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 10 Adaptive and Tailored Testing

open-ended items. Items may be visual, auditory test score will be his/her ability estimate when the and also multimedia items. It is also possible to test is over. The most widely used methods of consider a testlet (cluster of related items) instead estimation are based on the principle of max- of single items as the analysis unit. imum likelihood or Bayesian procedures. These An important question to pay attention to is methods have good properties when the number bank size. Well-known high-stake CATs, such as of items is high. Nevertheless, CATs are far from CAT-ASVAB (Sands et al., 1997), have more this ideal situation because they use very few than one thousand items, but CATs for other items. This circumstance gives place to biased uses ordinarily have smaller banks (even below estimations, especially in the early stages of the 150 or 200 items). The number of items also test. So, a problem with CAT is to find a method depends on the restrictions the item-selection that provides accurate estimations, which are algorithm has implemented and the IRT model unbiased and computationally efficient. Wang in use. and Vispoel (1998) and Cheng and Liou (2000) An item bank should also consider the ability have compared the characteristics (standard error, prospective examinees have and the intended bias, efficiency, etc.) of the IRT estimation test use. It should contain discriminative items for methods to determine when they are advisable the entire range of ability. The information in CAT. Since one of the problems that has been function of the item bank should match these paid the most attention is the bias of the requirements. estimations, several strategies have been proposed Banks should be updated, both in the for its control. Other questions, like the information on each item, as this information initial estimation and the effects of non-model changes after each item administration, and also fitting responses, have generated interest from in the items themselves, because as the CAT is researchers. increasingly administered, new items should be Once some items have been administered and an added and old ones removed. Online calibration ability estimate for the examinee obtained, a new deals with effective procedures to carry-out bank item has to be selected from the unused items updating. remaining in the item pool. Two common principles are used to guide item selection. Under the maximum information principle, the informa- ADAPTIVE HEURISTICS tion provided for each unused item for the last ability estimate is computed. The item with the A CAT needs four components in order to highest information value is selected and adminis- measure an examinee: (a) a procedure to select tered. In other words, the more helpful item in the first item to administer; (b) a method to order to increase precision is selected. The estimate the examinee’s ability and precision after maximum information principle faces some diffi- each administered item; (c) an algorithm for culties when the ability estimate is biased or selecting the remaining items; and (d) a criterion inaccurate, which is often the case when the test is to end the test administration. short. If the estimate separates appreciably from the There are some alternatives available for final estimation, the more informative items for selecting the first item. If we know the examinee’s these provisional estimations will be less informa- grade on other variables, and his/her course, this tive for the final estimation. As a result, some items information may be used to predict the exam- will have been of little use in the test. Cheng and inee’s ability by linear regression. The first item is Liou (2000) have proposed the use of alternative then selected to match the predicted ability. If information measures in order to circumvent this no information on the examinee is available, the difficulty. Under the maximum expected precision first item will then match a random ability criterion, the item selected will minimize the selected from the central values of the ability expected value of the variance of the Bayesian distribution. posterior distribution of ability. Several item- After each item is administered the examinee selection criteria based on this principle have been gives his/her response. The CAT needs to obtain proposed (van der Linden, 1998). the ability estimation from the observed responses Both the procedures share a problem derived to the set of administered items. The examinee’s from choosing in each test the best items in

[8.8.2002–12:29pm] [1–128] [Page No. 10] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Adaptive and Tailored Testing 11

the pool: some items are administered in most OTHER RESEARCH TOPICS tests (even in more than 50% of the tests), risking test security and validity, whereas others New Types of CATs are never shown (in one particular CAT, 80% of the items were never selected). To sort-out this Most of the CATs have been elaborated to difficulty, control exposure methods have been measure intelligence, aptitudes or achievement, implemented. These methods trade-off precision and they are based on IRT models for with security. When a CAT has in use an unidimensional dichotomous items. However, exposure control method, such as the Sympson– other alternatives have been considered in the Hetter method (Sympson & Hetter, 1985), past few years. The need to measure other precision of measurement is not as high, but constructs such as attitudes, whose items have the exposure rate of the most useful items is held the format of ordered categories, and the under control, and the smaller exposure rates are possibility of using the incorrect options of the obtained. Experimental CATs may have imple- multiple-choice items to improve the estimation mented one of the two pure item-selection of ability, started a new line of work interested procedures indicated above, but if a CAT has in elaborating CATs based on diverse types of to give valid measures it needs to attend to other polytomous models. Also, the acknowledgement considerations in order to select items, such as that more than one trait intervenes in almost all the appropriate representation of the content or the tasks has led to the use of multidimensional subject areas, the guarantee that the composition CATs (Segall, 2000). of the test is similar for all examinees, the control of the presence of items that should not CATs in Intelligent Tutoring appear together in the same test, etc. Item- Systems selection rules should then consider not only the basic principles indicated above, but also item- The use of IRT in CATs imposes a few important control exposure and other restrictions. Linear constraints (Almond & Mislevy, 1999): (a) IRT has programming techniques are often used to make a simple way of representing knowledge and skills item selection feasible when different restrictions that intervene in complex tasks (unidimensional- have to be simultaneously considered. ity); (b) it establishes strong assumptions that The administration of items ends when either can be violated on some occasions (conditional the test length or ability precision reach their independence); (c) it requires large samples to preset values. In the second case, all the estimate the item parameters; and (d) it offers a examinees will be measured with the same score to express the level of ability, which does not precision, but the number of items administered exactly indicate what the subjects know or can do and testing time will differ. Sometimes the (diagnosis). All these aspects reduce their use in stopping criterion is mixed: the test stops after measuring domains that require multiple knowl- presenting a preset number of items if it does edge, skills and abilities, as in educational and job not reach the targeted precision. performance assessment. There is a tendency in education to integrate measurement, assessment, diagnosis, teaching and learning. This means that it PSYCHOMETRIC PROPERTIES is necessary to know in detail the knowledge and skills dominated by the students, the kinds of As in a conventional test, reliability and validity mistakes they make, the strategies they use, etc. to studies have to be carried out in a CAT. Besides be able to adapt the contents and pedagogic traditional reliability methods, such as test– strategies to them. To what extent can this be retest, simulation may be used to obtain achieved by available CATs? Hardly at all, information on test functioning. Indices such as unfortunately. RSME, bias and efficiency can be easily This orientation in performance assessment computed. Concerning validity, the procedures is creating the need to introduce important in use for conventional tests are applicable to changes in CATs, most of them coming from CATs. For further details on this, see Chapters 7 the literature on intelligent tutoring systems (ITS). and 8 of Wainer et al. (2000). In computerized teaching, since the ITS appeared,

[8.8.2002–12:29pm] [1–128] [Page No. 11] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 12 Adaptive and Tailored Testing

there has been a growing interest in giving it not very favourable, is not discouraging. Wainer more capacity of accommodation for the candi- argues for more focus on areas where CATs will be dates, including a CAT in its module of useful: (a) when the construct cannot be measured assessment. Most of these systems do not use easily without a computer, (b) when the test has to IRT, they are based instead on other methodol- be continuously administered, and (c) when it is ogies, such as the rule-space methodology important for everyone involved to get the right (Tatsuoka & Tatsuoka, 1997), the knowledge measurement. spaces (Hockemeyer & Albert, 1999), Bayesian In the past few years, a growing tendency to networks, or graphical models (Almond & extend the use of CATs to the Internet, using its Mislevy, 1999), etc. Web service (CAT-Web) has been appreciated. This tendency basically responds to the interest of Conditions of Application having the distance learning system also offering an individualized assessment. In this way, more Lastly, many practical problems emerge when ITS destined to the Internet are continuously CATs become operative instruments used in real being released, and some of them already include life. One main concern is how to guarantee the a CAT-Web. security of the item pool against attempts at Finally, two challenges that CATs will face in illegitimate appropriation of its contents as well as the future will be to offer diagnostic information the complexity and high costs of the elaboration of quality on multiple abilities and to substan- process, maintenance and renewal. A second topic tially reduce the costs associated with the of interest tries to make the conditions of test elaboration of the item pool. In the first place, administration better psychologically for the CATs have to go further than the unidimen- candidates, such as obtaining optimum adjustment sional dichotomous IRT models, and especially in the difficulty of the test, allowing review of the to solve in an efficient way the problem of answers, and controlling the difficulty of the items multidimensionality. Moreover, according to the to reduce anxiety. objectives of the test, offering quantitative scoring may not be enough. The solution could be far from the IRT. The possibilities PROSPECTS AND CONCLUSIONS offered by the models of measurement based on knowledge, like the Bayesian inference network From a technical perspective, there have been or the knowledge space theory, would have to significant steps made in ability estimation be seriously considered. In the second place, an methods. Likewise, the item-selection heuristics alternative to online calibration and the auto- have reached a level of sophistication that makes matic generation of items that could serve to it capable of guaranteeing the elaboration of tests reduce costs is to elaborate instruments of that meet multiple requirements. In the next measurement that learn to measure. The generation of CATs, new models will be used. necessary elements would be a theoretical Very soon we will be able to see comparative model of the construct that is well supported, studies that analyse these new models that are a psychometric model, a group of experts on multidimensional and can handle polytomous the subject to obtain the initial parameters and response data. However, the CATs elaborated an algorithm of learning. The test will modify from these models have yet to prove that their the initial estimates of the experts from the advantages are worth the additional effort their empirical information collected, and from its elaboration requires. This is especially true for execution in activities in which it could be multidimensional models. trained through simulation. The algorithm Many practical problems have emerged with of learning would bring the values of CAT going operational (Wise & Kingsbury, 2000), the parameters up to date so they would especially those related to test security and costs. adapt to the predictions of the theoretical Wainer (2000) provided a critical discussion of the model and the available empirical evidence. supposed advantages attributed to CATs in the The uses of the scoring would be conditioned to 1980s, from the experience accumulated on their the degree of competence achieved by the test. massive use in the 1990s. His conclusion, though Although it may seem far-fetched, some attempts

[8.8.2002–12:29pm] [1–128] [Page No. 12] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Ambulatory Assessment 13

are being made in this direction in CATs of Paper presented at the meeting of the Military some ITS. Testing Association, San Diego, CA. Tatsuoka, K.K. & Tatsuoka, M. M. (1997). Computer- ized cognitive diagnostic adaptive testing: effect on remedial instruction as empirical validation. Journal References of Educational Measurement, 34(1), 3–20. van der Linden, W. J. (1998). Bayesian item selection criteria for adaptive testing. Psychometrika, 63(2), Almond, R.G. & Mislevy, R.J. (1999). Graphical 201–216. models and computerized adaptive testing. Applied Wainer, H. (2000). CATS: whither and whence. Psychological Measurement, 23(3), 223–237. Psicolo´ gica, 21, 121–133. Cheng, P.E. & Liou, M. (2000). Estimation of trait Wainer, H., Dorans, N.J., Flaugher, R. Green, B.F., level in computerized adaptive testing. Applied Milslevy, R.J., Steinberg, L. & Thissen, D. (2000). Psychological Measurement, 24(3), 257–265. Computerized Adaptive Testing: A Primer (2nd ed.). Hockemeyer, C. & Albert, D. (1999). The adaptive Hillsdale, NJ: Lawrence Erlbaum Associates. tutoring system RATH: a prototype. In Auer, M.E. Wang, T. & Vispoel, W.P. (1998) Properties of ability & Ressler, U. (Eds.), ICL99 Workshop Interactive estimation methods in computerized adaptive Computer Aided Learning: Tools and Applications. testing. Journal of Educational Measurement, 35, Austria: Willach 109–135. Lord, F.M. (1970). Some test theory for tailored Wise, S.L. & Kingsbury, G.G. (2000). Practical testing. In Holtzman, W.H. (Ed.), Computer issues in developing and maintaining a computer- Assisted Instruction, Testing and Guidance (pp. ized adaptive testing program. Psicolo´ gica, 21, 139–183). New York: Harper and Row. 135–155. Sands, W.A., Waters, B. K. & McBride, J. R. (1997). Computerized Adaptive Testing: From Inquiry to Operation. Washington: American Psychological Vicente Ponsoda and Julio Olea Association. Segall, D.O. (2000). Principles of multidimensional adaptive testing. In van der Linden, W.C. & Glas, C.A.W. (Eds.), Computerized Adaptive Testing: Theory and Practice (pp. 53–73). Boston, MA: Related Entries Kluwer-Nijhoff. Sympson, J.B. & Hetter, R.D. (1985). Controlling Item Computer-Based Testing, Item Banking, Item Exposure Rates in Computerized Adaptive Testing. Response Theory: Models and Features

A AMBULATORY ASSESSMENT

INTRODUCTION computers has eased the acquisition of data considerably. Computer-assisted methodologies Ambulatory assessment designates a new orienta- facilitate investigations in real-life situations tion in behavioural and psychophysiological where relevant behaviour can be much more assessment. This approach relates to everyday effectively studied than in the artificial environ- life (‘naturalistic’ ) and claims the ment of laboratory research. Such field studies ecological validity of research findings. Methods are essential, for example, in research on stress– of recording psychological data in everyday life strain or in research on mechanisms that have a long history in and trigger off psychological and psychophysiological . Event recorders for the timed symptoms. registration of stimuli and responses, ‘beeper’ Ambulatory assessment originated from a studies in which a programmable wristwatch number of previously rather independent research prompts the subject to respond to a question- orientations with specific objectives. Clinical naire, self-ratings on diary cards, and electronic (bedside) monitoring was introduced as a means data loggers have all been used for this purpose. of continuously observing a patient’s vital The arrival of pocket-sized (hand-held, palm-top) functions, e.g. respiratory and cardiovascular

[8.8.2002–12:29pm] [1–128] [Page No. 13] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 14 Ambulatory Assessment

parameters under anaesthesia, during intensive are: recordings in everyday life, computer-assisted care and in perinatal condition. If relevant methodology, attempts to minimize method- changes occur, i.e. if certain critical values are dependent reactivity, maintaining ecological exceeded, an alarm is set off. Biotelemetry validity and, therefore, outstanding practical employs transmitter–receiver systems (radio-tele- utility for various objectives – such as monitoring metry) in order to measure physical functions in and self-monitoring, screening, classification and real life, e.g. cardiovascular changes during selection, clinical diagnosis, and evaluation – in intense strain at the workplace or during athletic many areas of psychology and performance. Radio equipment basically makes (de Vries, 1992; Fahrenberg & Myrtek, 1996, two-sided communication possible, i.e. feedback, 2001a,b; Littler, 1980; Miles & Broughton, telestimulation and telecommand, in addition 1990; Pawlik & Buse, 1982, 1996; Pickering, to telemetry. Ambulatory monitoring means 1991; Suls & Martin, 1993). continuous observation of free-moving subjects (patients) in everyday life as compared to sta- tionary, bedside (‘wired’) monitoring. Ambulatory ACQUISITION OF PSYCHOLOGICAL monitoring can be conducted either by biotele- DATA BY HAND-HELD PC metry or by a portable recording system. This methodology is appropriate for patients who In psychology the hand-held PC so far has been exhibit significant pathological symptoms which, predominantly used for recording self-reports on for a number of reasons, cannot be reliably mood and other aspects of subjective state, detected in the physician’s office or hospital as including physical complaints and symptoms, compared to a prolonged observation in everyday that is, as an ‘electronic diary’ (e.g. job stress life. Such cases include ventricular arrhythmia, diary, pain diary). There are other kinds of data, ischaemic episodes, sleep apnea, and epileptic which can be obtained in field studies: objective seizures. Here, ambulatory monitoring furthers features of a behaviour setting, behaviour valid diagnoses as well as the stabilization of , behaviour and performance mea- medication. comprises observation sures (psychometric testing), self-measurements of in natural settings in contrast to the laboratory. various kinds, for example, blood pressure data, Field research is an essential methodology in and, possibly, ratings of environmental aspects. cultural anthropology, social research, and Potential contents of a computer-assisted protocol ethology. Likewise, in psychology and psycho- may further include, for example, individual physiology some research issues require field comments or self-evaluation in connection with studies to obtain valid data (see Kerlinger & events. Lee, 2000; Patry, 1982). Behavioural assessment methods include, besides laboratory observation, a variety of in-vivo (in-situ) tests, simulated and Advantages and Limitations quasi-naturalistic settings, such as behavioural The application of a programmable pocket PC in approach/avoidance tests (BATs) which were ambulatory assessment has many advantages: designed to assess behaviour disorders and clinical symptoms. . alarm functions for prompting the subject at Ambulatory assessment brings together those predefined intervals and a built-in reminder research orientations that correspond to each signal; other in their basic ecological perspective. . reliable timing of input, delay of input, and Ambulatory assessment involves the acquisition duration of input; of psychological data and/or physiological mea- . flexible layout of questions and response sures in everyday life according to an explicit categories; assessment strategy which relates data, theoretical . branching of questions and tailor-made constructs, and empirical criteria specific to the sequential or hierarchical strategies; given research issue. Such field studies are not . concealment of previously recorded re- solely concerned with the ambulatory monitoring sponses from the subjects; of patients, but rather include a wide spectrum of . convenience and ease of transfer of data to objectives and applications. Common features a stationary PC for statistical analysis.

[8.8.2002–12:29pm] [1–128] [Page No. 14] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Ambulatory Assessment 15

A higher technical reliability and ecological a stylus may present a problem for some subjects validity of computer-assisted recordings can be or patients. generally assumed compared with paper-and- pencil questionnaires and diaries that lack flexibility in data acquisition and exactness PHYSIOLOGICAL MEASUREMENT when timing responses. The versatility and wide AND MONITORING acceptance of computer-assisted data acquisition is evident, although there are limitations and obvious restrictions. All participants of such A wide selection of physiological variables have studies will need sufficient practical training in been measured in daily life, using mostly non- the basic features of the PC and the program, invasive methods. The ECG and blood pressure at least, to avoid malfunctions and missing enjoy by far the largest number of references in the data. In spite of the obvious increase in literature to ambulatory monitoring. The applica- computer literacy within the general population, tion is predominantly in the medical field and only there are sub-populations which are less to a much smaller extent in the behavioural familiar with such devices or may experience sciences, for example, in psychophysiology or problems. behavioural medicine. The advances in micro- Following the progress made in pocket-sized processor technology and storage capacity paved computers, software to facilitate the use of hand- the way for multi-channel recordings and – another held PCs in field studies has been developed in innovative step – led to the on-line analysis of many institutions, more or less geared to the medically important changes, for example, the needs of certain studies. More flexible software immediate detection of ST-depression in the ECG. systems suited to the requirements of a variety The recording of posture and motion is of applications are still an exception (AMBU another basic issue in the methodology of for in-field performance testing, cf. Buse and behaviour observation and performance measure- Pawlik, Hogrefe Verlag, Go¨ ttingen, Germany; ment. Piezoresistive sensors (muli-site calibrated MONITOR, a flexible software system for ambu- accelerometry) allow for: latory assessment, Psychology Department, . continuous recording and automatic detec- , Germany). The OBSERVER tion of changes in posture and movement; (Behaviour Observation System, Noldus Informa- . assessment of movement disorders, such as tion Technology, AG Wageningen, NL, Noldus, hand tremor, restless legs syndrome; 1991) was introduced to ease the recording of . detection of head movement, e.g. nodding behaviour observations in field studies in animal during a conversation, measured by a small and human ethology. accelerometer placed beneath the chin; Computer-assisted self-reports require a hand- . estimation of gross physical activity and held PC with certain features: a large display, energy expenditure. easy handling of basic controls, clock, beeper with volume control, sufficient capacity of To assist in objective behaviour analysis, a storage, low power consumption, and a low range of interesting variables could be measured weight. For many applications, a comparatively continuously: large alphanumeric keyboard (complete . voice signal recorded via a throat sensor QWERTY) is also preferable in order to ease (micro); recording and, especially, to record verbal . the temporal pattern of speech; responses. The latter may involve, for example, . ambient conditions recorded via suitable recording reports and comments about specific sensors for light, noise, and temperature. events, or reporting more precisely the occurrence of physical and psychological symptoms, which Some hand-held PCs allow for audio record- in either case hardly fit pre-defined categories. ings up to a number of minutes, depending on For some applications it may suffice to record storage capacity. Digital dictating systems have only ‘yes’ or ‘no’ responses or numbers. In this a capacity up to 240 minutes in long play 2 case, a smaller hand-held PC, e.g. the Palm mode. In psychological and psychophysiological series, may be preferable, although small keys or research, so far, little use has been made of digital

[8.8.2002–12:29pm] [1–128] [Page No. 15] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 16 Ambulatory Assessment

mini-cams or web-cams for recordings of the validation studies play an important role. More videostream of behaviour. recently, it has been questioned whether certain diagnostic techniques and measurements in the physician’s office or in the psychophysiological Recorder–Analyser laboratory, e.g. blood pressure measurement, Today, more than a dozen recorder–analyser reliably predict individual differences in real life. systems are available from international manufac- Laboratory–field comparisons revealed significant tures – not to mention the even greater number of discrepancies. Office hypertension is a good long-term ECG recorders–analysers and the long- example of how certain features of the setting and term-BP recorders. Only a few systems have a their meaning to the subject may play an important multi-channel design, the advantage of which is role in assessing individual differences: blood that they can be applied to a variety of research pressure readings are elevated if the measurement questions that require different recording channels is made by the physician, but normal readings are (for an overview of selected multi-purpose recorder obtained in everyday life. systems, see Fahrenberg, 2001). Besides the devices Laboratory–field comparisons were valuable in suitable for ambulatory recordings and their use in the evaluation of methodological issues as well as 24-h or long-term monitoring, a wide range of practical aspects. Field studies, apparently, are portable (mobile) equipment designed for in-field more suited for prolonged observation that may measurement does exist. extend over days and weeks. Accordingly, there is more chance for the detection of rare events and symptoms that occur at low frequencies or only ISSUES IN AMBULATORY in certain settings. Generally, larger response ASSESSMENT magnitudes and more realistic effect sizes may be expected in natural settings. Prolonged observa- Assessment strategies, designs, tion periods make the averaging/aggregation of and data analysis measurements possible so that reliability and stability of measures may increase substantially. In psychological research various designs for But field studies can be seriously threatened by computer-assisted ambulatory assessment have the confounding of multiple effects which tend to already been employed, whereby some of these produce ‘noise’ and, eventually, require relatively assessments lasted for many days or weeks. In large subject samples in order to obtain valid psychophysiology and in medicine, the restriction estimates for main effects. to a single 24-h recording appears to be the Psychophysiological monitoring. Multi-modal preferred format due to the costly equipment. psychophysiological 24-h monitoring methodolo- Ambulatory assessment requires the elaboration gies were developed in many fields, especially of specific designs and strategies, for example, the in research on blood pressure reactivity. strategic use and integration of time and event This method consists of multi-channel recordings sampling, and the development of appropriate of blood pressure, heart rate, physical activity, statistical models for multi-level analyses and for and – concurrent to each blood pressure meas- rather short time series (for a discussion, see urement – obtained a computer-assisted self- Fahrenberg & Myrtek, 2001a; Schwarz & Stone, report on setting, behaviour, emotional state, and 1998; Stemmler, 1996). It would be oversimpli- experience. fied to state methodological advantages of the Controlled monitoring. Recordings obtained in laboratory experiment as obstacles in field studies everyday life will often include multiple effects. and vice versa, i.e. to retain the notion of Therefore, investigators may wish to control for basically different research strategies instead of a unwanted variance, such as blood pressure changes wider perspective that includes laboratory and caused by physical activity. Concurrent recordings field as complementary approaches. of physical activity provided means for a segmenta- Laboratory–field comparisons are designed to tion of recordings according to high or low activity. examine the validity of findings obtained in the Furthermore, standardized or semi-standardized laboratory to predict performance in real life. In the measurement periods were included which served development of psychological tests such empirical as a reference for inter- and intra-individual

[8.8.2002–12:29pm] [1–128] [Page No. 16] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Ambulatory Assessment 17

comparison. As part of the standard protocol in ambulatory assessment may violate privacy more 24-h monitoring, the subjects performed specific easily than other methods. Furthermore, significant tasks: climbing a staircase, performing a mental others and bystanders may become involved when test, and participating in a short interview. the observation and the evaluation of settings are Interactive monitoring. The development of demanded. Obtaining the subject’s informed con- recorder equipment suitable for physiological sent before the recording starts is essential, but may and psychophysiological recordings and on-line be problematic since the exact course of daily (real-time) analysis led to innovative research activities and events cannot be anticipated. strategies. Contingent to changes of certain physi- Acceptance and impact of computer-assisted ological parameters, a patient can be prompted by a monitoring methodology in psychophysiology beeper signal to record specific events, activities, or and psychology. The ambulatory monitoring of symptoms. Myrtek et al. (1988) developed a new BP and ECG are now indispensable routine methodology for interactive monitoring of ‘addi- methods in medicine. The ever more widespread tional heart rate’ indicative for emotional states. application of the new methodology can be attributed to its practical usefulness which was evident in the increased validity of diagnosis and Acceptance, Compliance, and in the external validity of therapy outcome Reactivity evaluation. In contrast, computer-assisted mon- itoring and assessment still appear to have had From the beginning, there have been concerns little impact in psychophysiology and psychology. raised about the acceptance of hand-held PCs, and Standard textbooks on behavioural research the validity of monitoring in daily life has been methods and assessment in clinical psychology questioned. Ambulatory assessment with a pocket hardly refer to the new methodologies based on PC or recorder depends on the favourable computer-assisted data acquisition and monitor- of the participating subjects. It is essential that the ing in the natural environment. equipment is readily accepted and that good compliance to instructions is established and sustained. If the ambulatory monitoring is part of PERSPECTIVES a diagnostic process or a treatment programme, the patient’s compliance may be higher than in Computer-assisted ambulatory assessment is an research projects. The ambulatory assessment emerging new methodology. Progress is obvious should, of course, not cause major problems with not only in instrumentation, but in assessment the social environment. strategies as well. Ambulatory assessment, like The method of observation and measurement any other method, has problematic aspects, in itself may cause unwanted variance because of particular how to account for multiple effects in specific interactions such as awareness, adapta- the recordings, but the benefits are evident: tion, sensitization, and coping tendencies. Three aspects of reactivity appear to be specific to . recording of relevant data in natural set- ambulatory assessment. Subjects may: (1) tend to tings; steer clear of certain settings during the recording . real-time measurement of behavioural and in order to avoid being monitored there; (2) tend physiological changes; to unintentionally or deliberately manipulate the . real-time assessment and feedback by re- recording systems, shift settings of the PC and porting physiological changes to the subject; may even try to get access to the program; and . concurrent assessment of psychological and (3) try to test their capacities or the equipment by physiological changes (detection of events, unusual patterns of behaviour, exercise or episodes); vigorous movements. A comprehensive post- . correlation and contingency (symptom–con- monitoring interview is recommended in order text) analysis across systemic levels as to obtain information on these essential aspects. suggested in triple-response models (multi- Ethical issues that are specific to ambulatory modal assessment); monitoring studies have hardly been discussed yet. . ecological validity of findings and suitability Appropriate data protection is but one aspect, as for direct application.

[8.8.2002–12:29pm] [1–128] [Page No. 17] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 18 Ambulatory Assessment

Genuine research findings in relevant fields practical application in many fields and essential suggest further development and application of findings have been obtained. ambulatory assessment methodology. The expecta- tion is that the hand-held PCs and the recorder– analyser for physiological measures will in future References become smaller, cheaper and more refined. Such developments may include new strategies in Burnett, K.F., Taylor, C.B. & Agras, W.S. (1985). controlled or interactive monitoring and on-line Ambulatory computer assisted therapy for obesity: a feedback, monitoring and concurrent recording of new frontier for behavior therapy. Journal of audio and video signals (intelligently pre-processed Consulting and Clinical Psychology, 53, 698–703. before stored), setting-dependent sampling, new de Vries, W. (Ed.). (1992). The Experience of Psychopathology. Investigating Mental Disorders in strategies in self-monitoring and self-management Their Natural Settings. Cambridge: Cambridge in chronic illness. University Press. A hand-held PC may be useful in the Fahrenberg, J. (2001). Origins and developments of diagnostic assessment of a variety of behaviour ambulatory monitoring and assessment In Fahren- disorders, for example, the assessment and self- berg, J. & Myrtek, M. (Eds.). Progress in Ambulatory Assessment: Computer-Assisted Psycho- management of drinking, smoking, and of eating logical and Psychophysiological Methods in Mon- disorders, and in facilitating self-management in itoring and Field Studies (pp. 587–614). Seattle, WA: chronic illness. Computer programs that are Hogrefe and Huber. based on a hand-held PC can be used as a Fahrenberg, J. & Myrtek, M. (Eds.) (1996). Ambula- component of behavioural therapy (cf. a pilot tory Assessment: Computer-Assisted Psychological and Psychophysiological Methods in Monitoring and study by Burnett et al., 1985). Field Studies. Seattle, WA: Hogrefe and Huber. There are noticeable developments which Fahrenberg, J. & Myrtek, M. (Eds.) (2001a). Progress probably exert an essential influence on the in Ambulatory Assessment: Computer-Assisted Psy- computer-assisted methods in medicine and the chological and Psychophysiological Methods in Monitoring and Field Studies. Seattle, WA: Hogrefe behavioural sciences: the arrival of the wireless and Huber. application protocol WAP, mobile phone short Fahrenberg, J. & Myrtek, M. (2001b). Ambulantes message systems SMS, the web-based mobile Monitoring und Assessment. In Ro¨ sler, F. (Ed.), telecommunication (IMT-2000 and UMTS) and Enzyklopa¨die der Psychologie. Serie Biologische the new patient monitoring equipment, which Psychologie. Band 4: Grundlagen und Methoden der Psychophysiologie (pp. 657–798). Go¨ ttingen: appears to revolutionize the way in which Hogrefe. patient information is transmitted and used Kerlinger, F.N. & Lee, H. B. (2000). Foundations in the healthcare system. At present, we may of Behavioral Research. Fort Worth, TX: Harcourt. only speculate about the consequences of such Littler, W.A. (Ed.). (1980). Clinical and Ambulatory developing information technologies for the Monitoring. London: Chapman and Hall. Miles, L.E. and Broughton, R.J. (Eds.) (1990). Medical healthcare system and, to some extent, on Monitoring in the Home and Work Environment. subsequent developments in applied fields of New York: Raven Press. psychology. Myrtek, M., Bru¨ gner, G., Fichtler, A., Ko¨ nig, K., Mu¨ ller, W., Foerster, F. & Ho¨ ppner, V. (1988). Detection of emotionally induced ECG changes and their behavioral correlates: a new method for ambulatory monitoring. European Heart Journal, 9 CONCLUSIONS (Supplement N), 55–60. Noldus, L.P.J.J. (1991). The Observer: a software During the last two decades, a fast develop- system for collection and analysis of observational data. Behavior, Research Methods, Instruments & ment in microprocessor technology has enabled Computers, 23, 415–429. the design of new instrumentation and, accord- Patry, J.L. (Ed.) (1982). Feldforschung. Methoden und ingly, new methodologies in medicine and the Probleme sozialwissenschaftlicher Forschung unter behavioural sciences. Multi-channel recorders– natu¨ rlichen Bedingungen. Bern: Huber. analysers and special purpose devices for physio- Pawlik, K. & Buse, L. (1982). Rechnergestu¨ tzte Verhaltensregistrierung im Feld: Beschreibung und logical measures and convenient hand-held PCs erste psychometrische U¨ berpru¨ fung einer neuen for acquisition of psychological data are avail- Erhebungsmethode. Zeitschrift fu¨ r Differentielle able. Such systems allow innovative research and und Diagnostische Psychologie, 3, 101–118.

[8.8.2002–12:29pm] [1–128] [Page No. 18] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Analogue Methods 19

Pawlik, K. & Buse, L. (1996). Verhaltensbeobachtung Suls, J. & Martin, R.E. (1993). Daily recording and in Labor und Feld. In Pawlik, K. (Ed.), Enzyklopa¨die ambulatory monitoring methodologies in behavior- der Psychologie. Differentielle Psychologie und al medicine. Annals of Behavioral Medicine, 15, Perso¨ nlichkeitsforschung. Band 1. Grundlagen 3–7. und Methoden der Differentiellen Psychologie (pp. 359–394). Go¨ ttingen: Hogrefe. Jochen Fahrenberg Pickering, T. G. (1991). Ambulatory Monitoring and Blood Pressure Variability. London: Science Press. Schwarz, J.E. & Stone, A.A. (1998). Strategies for analysing ecological momentary assessment data. Related Entries , 17, 6–16. Stemmler, G. (1996). Strategies and designs in ambula- Psychophysiological Equipment and Measure- tory assessment. In Fahrenberg, J. and Myrtek, M. ment, Applied Fields: Psychophysiological, (Eds.), Ambulatory Assessment: Computer-Assisted Equipment for Assessing Basic Process, Field Psychological and Psychophysiological Methods in Survey: Protocols Development, Observational Monitoring and Field Studies. (pp. 257–268). Seattle, Methods (General) WA: Hogrefe and Huber.

A ANALOGUE METHODS

INTRODUCTION observation might be preferable (i.e. general- izability inferences are minimized), the first two Analogue behavioural observation (ABO) purposes require controlled experimentation, involves a situation designed by, manipulated necessitating ABO; for the third purpose, ABO is by, or constrained by an assessor that elicits a often preferable because it allows the observer to measured behaviour of interest. Observed beha- ‘‘stack the deck’’ to make it more likely that the viours comprise both verbal and non-verbal behaviours (and/or functional relations) of interest emissions (e.g. motor actions, verbalized attribu- will occur when the assessor can see them. tions, observable facial reactions). ABO exists on a continuum of naturalism, ranging from highly contrived situations (e.g. How DOMAINS quickly do people walk down the hallway after being exposed to subconsciously presented words ABO comprises two main assessment domains: about ageing? Bargh et al., 1996) to naturalistic individual/situation interactions and social situa- situations arranged in unnatural ways or settings tions. The goals of individual/situation interaction (e.g. How do couples talk with one another when experiments are to manipulate the setting and test asked to discuss their top problem topic? Heyman, individual differences in response. This domain 2001) to naturalistic situations with some (but comprises a wide variety of tasks in developmental minimal) experimenter-dictated restrictions (e.g. psychology (e.g. strange situation experiments; family observations in the home; Reid, 1978). Ainsworth et al., 1978), (e.g. emotion regulation experiments; Tice et al., 2001) and clinical psychology (e.g. functional analysis of WHY USE ABO? self-injurious behaviour; Iwata et al., 1994; social anxiety assessment; Norton & Hope, 2001). ABO is used as a hypothesis-testing tool for three The social situation domain employs ABO purposes: (a) to observe otherwise unobservable mostly as a convenience in assessing quasi-natu- behaviours, (b) to isolate the determinants of ralistic interaction. The goal of such assessment is behaviour, and (c) to observe dynamic qualities typically to understand behaviour and its determi- of social interaction. Although naturalistic nants in dynamic, reciprocally influenced systems

[8.8.2002–12:29pm] [1–128] [Page No. 19] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 20 Analogue Methods

(e.g. groups, families, couples). Understanding interaction (e.g. Leonard & Roberts, 1998), generalizable factors that promote or maintain cooperation and competition (e.g. the prisoner’s problem behaviours in such systems typically dilemma paradigm; Sheldon, 1999), and aggres- requires more naturalistic approaches than those sion (e.g. Bandura, 1986). used in the other domain. Thus, although experimentation is often extremely useful in understanding causal relations in social situations Psychometric Considerations (e.g. whether maternal attributions affect mother– Each ABO paradigm and its accompanying coding child interactions; Slep & O’Leary, 1998), most systems must be separately considered for relia- such ABO investigations aim for quasi-naturalism. bility, validity, and utility. Like all psychological assessments, ABO depend ‘on the goals of assessment, the assessment settings, the CLINICAL ASSESSMENT methods of assessment, the characteristics of the measured variable, and the inferences that are ABO is a useful tool in clinical assessment, to are drawn from the obtained measures’ (Haynes although relatively few ABO paradigms have & O’Brien, 2000: 201). been developed specifically with this application The psychometrics of ABO paradigms and in mind. To be clinically useful, ABO must coding systems has received little direct attention efficiently provide reliable, valid, and non- (see a special issue of Psychological Assessment, redundant (but cost-effective) information. March 2001, for a notable exception). The validity An apt analogy for research-protocol based of the ABO paradigms is implied by the results of assessment vs. field-realistic assessment might be studies using that paradigm. As such, ABO para- found in the treatment literature. In recent years, digms and their coding systems often have excellent a distinction has evolved between efficacy studies validity and reported inter-rater agreement. (i.e. those studying interventions under tightly controlled, idealized circumstances, such as a trial of treatment for major depressive disorder that Coding eliminates all potential participants with co- Although we have described ABO as a hypothesis morbid disorders) and effectiveness studies (i.e. testing tool, in reality it is a hypothesis testing those studying interventions under real-world setting; coding the observed behaviours turns ABO conditions). Because we do not have an adequate into a true tool. Creation or use of a coding system research body of effectiveness studies, clinicians is a theoretical act, and the following questions in the field, urged to use empirically validated should be answered before proceeding: Why are treatments, are expected to adapt such protocols you observing? What do you hope to learn? How to meet real-world demands. Similarly, clinicians will it impact your hypotheses (i.e. either research should be urged to use empirically validated ABO questions or case-conceptualization questions)? when it would be appropriate, but should be This is especially true because coding of many expected to adapt ABO protocols in a cost- ABO target behaviours is difficult to do in a effective but still clinically-informative manner. reliable, valid, and cost-effective manner. Interested readers should consult several excellent resources for more complete coverage (e.g. Bakeman & ABO Protocols Gottman, 1997; Haynes & O’Brien, 2000). Space limitations preclude a summary of the wide variety of ABO protocols. We note, however, Sampling literatures on parent–child interaction (e.g. Roberts, 2001), couple interaction (e.g. Heyman, The major sampling strategies are event sampling 2001), social anxiety and social interaction skills (the occurrence of behaviour is coded, ideally in (Norton & Hope, 2001), fear (e.g. McGlynn & sequential fashion), duration sampling (the length Rose, 1998), self-injurious behaviour in those with of each behaviour is recorded), interval sampling developmental disabilities (e.g. Iwata et al., 1994), (the ABO period is divided into time blocks; the effect of alcohol consumption on family during each time block, the occurrence of each

[8.8.2002–12:29pm] [1–128] [Page No. 20] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Analogue Methods 21

code is noted), and time sampling (intermittent When functional relations are of interest, observations are made, typically in a duration or testing how interactions unfold across time interval sampling manner). Advantages and dis- becomes important. Functional relation hypoth- advantages of each are discussed in Bakeman and eses can be addressed with conditional probabil- Gottman (1997) and Haynes and O’Brien (2000). ities or with sequential analysis, which is similar to conditional probability analysis but which allows for significance testing. Dimensional data Choosing What to Code assessed continuously would use time-series analysis instead of sequential analysis. Some behaviours are so concrete that the observer serves more as a recorder than a coder (e.g. duration of a discrete behaviour). Other behaviours CONCLUSIONS require at least some degree of inference. Such coding necessitates the use of culturally sensitive ABO can be a good theory-testing tool because raters, using specified decision rules, to infer that a (depending on exactly how it is employed) it combination of situational, linguistic, paralinguis- minimizes inferences needed to assess behaviour, tic, or contextual cues amounts to a codeable it can facilitate formal or informal functional behaviour. Concrete codes are not necessarily analysis, provide the assessor with experimental better than social informant-inferred codes; some- control of situational factors, facilitate the times one allows for a more valid measurement of observation of otherwise unobservable beha- a construct, sometimes the other does. In accord viours, and provide an additional mode of with Occam’s razor, coding should be as simple assessment in a multimodal strategy (e.g. ques- as possible to reliably capture the behavioural tionnaires, interviews, observation). Finally, constructs of interest. because the assessor can set up a situation that Global (i.e. molar) coding systems make increases the probability that behaviours of summary ratings for each code over the entire interest will occur during the observation ABO (or across large time intervals). Codes tend period, ABO can be high in clinical utility and to be few, representing behavioural classes (e.g. research efficiency. negativity). Microbehavioural (i.e. molecular) Like any tool, however, ABO’s usefulness systems code behaviour as it unfolds over time, depends on its match to the resources and and tend to have many fine-grained behavioural needs of the person considering using it. ABO can codes (e.g. eye contact, criticize, whine). be a time, labour, and money-intensive assess- Topographical coding systems measure the ment strategy. The use of research-tested proto- occurrence of a behaviour (including, potentially, cols/coding is often impractical in clinical settings; its duration). Dimensional coding systems measure adaptations of empirically supported ABO meth- the intensity of the behaviour. Microbehavioural odology in clinical settings may render them systems tend to be topographical; although global unreliable and of dubious validity. The condi- systems tend to be use-rating scales, they may tional nature of validity may make it difficult to summarize frequency rather than intensity. Dimen- generalize ABOs to the broad variety of real- sional coding of intensity, especially on a point-by- world settings. Finally, the less naturalistic the point basis, has been used sparingly in ABO. ABO situation, the more nagging the concerns about external validity. Analyses ABO frequently uses single subject multiple Acknowledgements baseline designs. Data are plotted and visually inspected for trends. Preparation of this chapter was supported by Statistical analysis of ABO data uses standard the National Institutes of Mental Health (Grant statistical tools. Between-groups hypotheses R01MH57779) and National Center for Injury about behavioural frequencies are tested with Prevention and Control, Centers for ANOVA, continuous association hypotheses are Disease Control and Prevention (Grant tested with correlations or regressions. R49CCR218554-01)

[8.8.2002–12:29pm] [1–128] [Page No. 21] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 22 Anger, Hostility and Aggression Assessment

References Behavioural Assessment: A Practical Handbook (4th ed., pp. 179–209). Needham Heights, MA: Allyn & Bacon. Ainsworth, M.D.S., Blehar, M.C., Waters, E. & Wall, S. Norton, P.J. & Hope, D.A. (2001). Analogue observa- (Eds.) (1978). Patterns of Attachment: A Psychologi- tional methods in the assessment of social function- cal Study of the Strange Situation. Hillsdale, NJ: ing in adults. Psychological Assessment, 13, 59–72. Erlbaum. Reid, J.B. (Ed.) (1978). A Social Learning Approach, Bakeman, R. & Gottman, J.M. (1997). Observing Vol. 2: Observation in Home Settings. Eugene, OR: Interaction: An Introduction to Sequential Analysis Castalia. (2nd ed.). New York: Cambridge University Press. Roberts, M.W. (2001). Clinic observations of struc- Bandura, A. (1986). Social Foundations of Thought tured parent-child interaction designed to evaluate and Action: A Social Cognitive Theory. Englewood externalizing disorders. Psychological Assessment, Cliffs, NJ: Prentice-Hall. 13, 46–58. Bargh, J.A., Chen, M. & Burrows, L. (1996). Auto- Sheldon, K.M. (1999). Learning the lessons of tit-for- maticity of social behaviour: direct effects of trait tat: even competitors can get the message. Journal construct and stereotype activation on action. Journal of Personality and Social Psychology, 77, of Personality and Social Psychology, 71, 230–244. 1245–1253. Haynes, S.N. & O’Brien, W.H. (2000). Principles and Slep, A.M.S. & O’Leary, S.G. (1998). The effects of Practice of Behavioural Assessment. New York: maternal attributions on parenting: an experimental Kluwer. analysis. Journal of Family Psychology, 12, Heyman, R.E. (2001). Observation of couple conflicts: 234–243. clinical assessment applications, stubborn truths, and Tice, D.M., Bratslavsky, E. & Baumeister, R.F. (2001). shaky foundations. Psychological Assessment, 13, Emotional distress regulation takes precedence over 5–35. impulse control: if you feel bad, do it! Journal of Iwata, B.A., Pace, G.M., Dorsey, M.F., Zarcone, J.R., Personality and Social Psychology, 80, 53–67. Vollmer, B., & Smith, J. (1994). The functions of self-injurious behaviour: an experimental-epidemio- Richard E. Heyman and Amy M. Smith Slep logical analysis. Journal of Applied Behaviour Analysis, 27, 215–240. Leonard, K.E., & Roberts, L.J. (1998). The effects of alcohol on the marital interactions of aggressive and nonaggressive husbands and their wives. Journal of Related Entries , 107, 602–615. McGlynn, F.D. & Rose, M.P. (1998). Assessment of Observational Methods (General), Observa- anxiety and fear. In Bellack, A.S. & Hersen, M. (Eds.), tional Techniques in Clinical Settings

ANGER, HOSTILITY AND A AGGRESSION ASSESSMENT

INTRODUCTION On the basis of a careful review of the research literature on anger, hostility and Over the last 25 years, interest in measuring the aggression, the following definitions of these experience, expression, and control of anger has constructs were proposed by Spielberger et al. been stimulated by evidence that anger, hostility (1983: 16): and aggression were associated with hypertension Anger usually refers to an emotional state that and cardiovascular disease (Williams, Barefoot, & consists of feelings that vary in intensity, from mild Shekelle, 1985; Dembroski, MacDougall, irritation or annoyance to intense fury and rage. Williams, & Haney, 1984). While definitions of Although hostility involves angry feelings, this anger-related constructs are often inconsistent and concept has the connotation of a complex set of ambiguous, the experience and expression of anger attitudes that motivate aggressive behaviours direc- ted toward destroying objects or injuring other are typically encompassed in definitions of hostility people. The concept of aggression generally implies and aggression. Clearly, anger is the most funda- destructive or punitive behaviour directed towards mental of these overlapping constructs. other persons or objects.

[8.8.2002–12:29pm] [1–128] [Page No. 22] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Anger, Hostility and Aggression Assessment 23

The physiological and behavioural manifesta- from the traditional trait-anger instructions. tions of anger, hostility and aggression have been Rather than directing subjects to respond investigated in numerous studies, but until recently; according to how they generally feel, they are angry feelings have been largely ignored in psycho- instructed to report on how often they react or logical research. Consequently, psychometric mea- behave in a particular manner when they feel sures of anger, hostility and aggression generally do ‘‘angry or furious’’ (e.g. ‘I say nasty things’; ‘I not distinguish between feeling angry, and the exp- boil inside, but don’t show it’) by rating ression of anger and hostility in aggressive behavi- themselves on the same 4-point frequency scale our. Most measures of anger-related constructs also that is used with the Trait-Anger Scale. fail to take the state–trait distinction into account, The identification of anger control as an and confound the experience and expression of independent factor stimulated the construction anger with situational determinants of angry of a scale to assess the control of angry feelings behaviour. A coherent theoretical framework that (Spielberger et al., 1988). The content of three of recognizes the difference between anger, hostility the 20 original AX Scale items (e.g. control my and aggression as psychological constructs, and temper, keep my cool, calm down faster), which that distinguishes between anger as an emotional were included to assess intermediate levels of state and individual differences in the experience, anger-expression as a unidimensional bipolar expression and control of anger as personality scale, guided the generation of additional anger traits, is essential for guiding the construction and control items (Spielberger et al., 1985). cross-cultural adaptation of anger measures. The last stage in the construction of the STAXI was stimulated by the research of psycholinguists, who identified English metaphors for anger, which ASSESSMENT OF ANGER: called attention to the need to distinguish between MEASURING STATE–TRAIT AND THE two different mechanisms for controlling anger EXPRESSION AND CONTROL OF expression (Lakoff, 1987). The prototype of the ANGER anger metaphor was described as a hot liquid in a container, where blood was the hot liquid and the The State–Trait Anger Expression Inventory body was the container. The intensity of anger as an (STAXI) was developed to measure the experience, emotional state is considered analogous to the expression and control of anger (Spielberger et al., variations in the temperature of the hot liquid. The 1985; Spielberger, Krasner, & Solomon, 1988). metaphor, boiling inside, has the connotation of an The State–Trait Anger Scale (STAS) was construc- intense level of suppressed anger; blowing off steam ted to assess the intensity of anger as an emotional connotes the outward expression of angry feelings; state, and individual differences in anger proneness keeping the lid on implies controlling intense anger as a personality trait (Spielberger et al., 1983). State by preventing the outward expression of aggressive anger was defined as ‘... an emotional state behaviour. Thus, Lakoff’s (1987) anger metaphors marked by subjective feelings that vary in intensity suggested two quite different mechanisms for from mild annoyance or irritation to intense fury or controlling anger: keeping angry feelings bottled rage, which is generally accompanied by muscular up to prevent their expression, and reducing the tension and arousal of the autonomic nervous intensity of suppressed anger by cooling down. system’. Trait Anger refers to individual differences In the original STAXI scale, the content of all but in the disposition to experience angry feelings. The one of the eight Control items was related to STAS Trait-Anger Scale evaluates how frequently controlling anger-out (e.g. ‘I Control my temper’). State anger is experienced. Therefore, a number of new items were constructed Recognition of the importance of distinguish- to assess the control of anger-in by reducing the ing between the experience and expression of intensity of suppressed anger (Sydeman, 1995). The anger stimulated the development of the Anger content of these items described efforts to calm Expression (AX) Scale (Spielberger et al., 1985). down, cool off or relax when a person feels angry The AX Scale assesses how often anger is or furious. Factor analyses of the responses of large suppressed (anger-in) or expressed in aggressive samples of male and female adults to the anger- behaviour (anger-out). The instructions for control items identified two anger control factors responding to the AX Scale differ markedly for both sexes: Anger Control-In and Control-Out.

[8.8.2002–12:29pm] [1–128] [Page No. 23] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 24 Anger, Hostility and Aggression Assessment

OTHER MEANS OF MEASURING (g) Subjective Anger Scale – SAS (Knight, ANGER Ross, Collins & Parmenter, 1985): it measures the patient’s proneness to experi- (a) Novaco Anger Inventory (Novaco, 1975, ence anger by means of nine different 1977): this inventory is made up of 80 situations and four scales of anger anger-provoking situations. Its reliability response. coefficient is rather high at 0.96, within a (h) The Anger Situation Scale and the Anger sample of 353 students (Biaggio, 1980). Symptom Scale (Deffenbacher, Demm & This inventory has shown remarkable Brander, 1986). They describe in detail the differences between psychiatric patients two worst, ongoing angering situations, with anger problems and normal popula- and also, the two most salient physical signs tion (Novaco, 1977). of anger. (b) Multidimensional Anger Inventory – MAI (Siegel, 1985): it is made up of 38 items, with a five-point ‘Likert’ scale. It measures ASSESSMENT OF HOSTILITY anger-in with ruminations, anger-out with ruminations, anger-incited situations and 1 Cook-Medley Ho Scale (Cook and Medley, hostile attitudes. It also provides a com- 1954). The Ho scale is a part of the MMPI. prehensive index of anger in all its This scale which has been widely used to manifestations. measure hostility, in research on Health (c) Harburg Anger In/Anger Out Scale Psychology. However, its development has (Harburg Erfurt, Chape, Hauenstein, been shaped through research on rapport Schull & Schork, 1973): this scale consists between teachers and students. Barefoot, of a series of hypothetical interpersonal Dodge, Peterson, Haney & Williams (1989) situations which may generate anger. It is a identified two subsets of items, which two-dimensional scale: it measures anger- represent cognitive, affective and beha- in and anger-out, whereas at the same time vioural manifestations of hostility. Another it also provides a means of measuring subset of items reflects the tendency to elicit resentment and reasoning. hostile intent from other people’s behaviour. (d) Anger Self-Report Scale – ASR (Zelin, The remaining subset of items identifies Alder & Myerson, 1972): it consists of 74 social avoidance. Its test–retest reliability items with a six-point ‘Likert’ scale. It has been of 0.84 in a four-year interval measures anger awareness and anger exp- (Shekelle, Gale, Ostfeld & Paul, 1983). ression. The anger expression scale makes a 2 The Buss–Durkee Hostility Inventory – distinction between different sub-scales or BDHI (Buss & Durkee, 1957): This scale levels of expression. This test has shown an consists of 75 items, with a true–false average reliability coefficient in samples of response scheme. It is one of the most psychiatric patients and students. comprehensive instruments to measure hos- (e) Anger Control Inventory: this test consists tility. It is made up of seven sub-scales: of 134 items combining ten anger-provok- Assault, Indirect Hostility, Irritability, Nega- ing situations and six scales of anger tivity, Resentment, Suspicion and Verbal response which describe cognitive, physio- Hostility. The factorial analysis of these logical and behavioural characteristics. Its scales reveals two well-defined factors. One reliability coeficient varies from 0.55 to of them reflects hostile expression and the 0.89 (Hoshmand & Austin, 1987). other experiential aspects of hostility. Its test– (f) Framingham Anger Scale: these are self- retest reliability, given a two-week interval, is report scales developed during the Framing- 0.82 for the total hostility measurement ham Proyect (Haynes, Levine, Scotch, (Biaggio, Supplee & Curtis, 1981). Feinleib & Kannel, 1978). These scales are 3 Factor L: It is a sub-scale of a more general used to measure anger symptoms, anger-in personality inventory Cattell’s 16 P.F. and anger-out, and anger expression. (Cattell, Eber & Tatsuoka, 1970). It is

[8.8.2002–12:29pm] [1–128] [Page No. 24] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Anger, Hostility and Aggression Assessment 25

described as a measure of suspiciousness Factor analyses of responses to the 56 pre- versus trust. liminary STAXI-SMC items confirmed the hypo- thesized structural properties of the inventory. The eight factors that were identified corre- sponded quite well with similar factors in the CROSS-CULTURAL ASSESSMENT STAXI-2. These included two S-Anger factors, OF ANGER, HOSTILITY AND two T-Anger factors, and four anger expression AGGRESSION: THE SPANISH and control factors (Moscoso & Spielberger, MULTICULTURAL STATE-TRAIT 1999a). In separate factor analyses of the S- ANGER EXPRESSION INVENTORY Anger items, two distinctive factors were identified for both males and females: ‘‘Feeling Spanish is spoken not only in Spain, but also in Angry’’ and ‘‘Feel Like Expressing Anger’’. more than 20 countries in Central and South However, gender differences in the strength of America and the Caribbean, and by more than the item loadings on these factors raised 25 million native speakers of Spanish who reside interesting questions with regard to how Latin in the United States. Although Spanish is the American men and women may differ in the primary language in most of Latin America and experience of anger. For females, the ‘‘Feeling for many Hispanic residents in the U.S., the Angry’’ factor accounted for 73% of the total indigenous cultures of these people often have variance, while this factor accounted for only profound effects on the Spanish they speak, and 19% of the variance for males. In contrast, the on the development of personality characteristics ‘‘Feel Like Expressing Anger’’ factor accounted that influence their behaviour. Therefore, it is for 70% of the total variance of the males, but important to recognize the exceptionally com- only 13% for females. plex social and cultural diversity of Hispanic The factor analyses of the T-Anger STAXI- populations, and the fact that language differ- SMC items also identified separate Angry ences between these groups may outweigh the Temperament and Angry Reaction factors, similarities. Consequently, in adapting English providing strong evidence that the factor struc- measures of emotion and personality for use in ture for this scale was similar to that of the Spanish-speaking cultures, care must be taken to STAXI-2. Factor analyses of the STAXI-SMC ensure that the key words and idiomatic anger expression and control items identified the expressions used for assessing anger-related same four factors as in the STAXI-2. The items concepts have essentially the same meaning in designed to assess anger-in and anger-out, and different Hispanic cultural groups. the control of anger-in and anger-out, had high The STAXI-2 (Spielberger, 1999) was adapted to loadings on the corresponding anger expression measure the experience, expression and control of and control factors, which were similar for both anger in culturally diverse populations in Latin sexes. The alpha coefficients for the STAXI-SMC America, and in Spanish-speaking sub-cultures in State and Trait Anger scales and sub-scales, and the United States (Moscoso & Spielberger, 1999a). the anger expression and anger control scales, Toward achieving this goal, the Spanish were reasonably high, indicating that the internal Multicultural State-Trait Anger Expression consistency of these scales was satisfactory. Inventory (STAXI-SMC) was designed to mea- In summary, the results of the factor analyses sure essentially the same dimensions of anger that of responses of the Latin American subjects to the are assessed with the revised STAXI-2. Scales and STAXI-SMC items of the Latin American subjects subscales were constructed to assess the following identified eight factors that were quite similar to dimensions with the STAXI-SMC: (a) State Anger, those found for the STAXI-2. Factor analyses of with sub-scales for assessing Feeling Angry and Feel the anger expression and control items also Like Expressing Anger; (b) Trait Anger, with sub- identified the same four factors that are found in scales for measuring Angry Temperament and the STAXI-2. Thus, the multi-dimensional factor Angry Reaction; and (c) trait scales for measuring structure of the STAXI-SMC for the Latin four dimensions of anger expression and control: American respondents was remarkably similar anger-in, anger-out, and the control of anger-in and to the factor structure of the English STAXI-2. anger-out (Moscoso & Spielberger, 1999b). The adaptation of the STAXI-2 test carried out in

[8.8.2002–12:29pm] [1–128] [Page No. 25] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 26 Anger, Hostility and Aggression Assessment

Spain, using Spanish mainland natives (Miguel- traits. The construction and development of the Tobal, Cano-Vindel, Casado and Spielberger, Spanish Multicultural State-Trait Anger 2001) is made up of 49 items with a similar Expression Inventory was guided by definitions of factorial structure and the same sub-scales. state and trait anger and anger-expression and anger-control as these constructs were conceptua- lized in the STAXI-2. FUTURE PERSPECTIVES AND Factor analyses of the items constructed for CONCLUSIONS the STAXI-SMC identified eight factors that were quite similar to the factor structure of the Over the last quarter century, interest in STAXI-2. Research on the STAXI-2 and the measuring the experience, expression and control STAXI-SMC clearly indicates that anger and of anger has been stimulated by evidence that hostility as psychological constructs can be anger, hostility and aggression were associated meaningfully defined as emotional states that with health problems and life-threatening disease. vary in intensity, and as complex personality While definitions of anger-related constructs are traits with major components that can be often inconsistent and ambiguous, the experience measured empirically. and expression of anger are typically encom- The importance anger and hostility have within passed in definitions of hostility and aggression. the fields of Psychology, and particularly of Clearly, anger is the most fundamental of these Health, asks for precise means of assessment and overlapping constructs. measurement. Nowadays, there are some remark- A sound theoretical framework that recognizes able self-report tests available, which provide the difference between anger, hostility and aggres- evidence of cross-cultural validity. However, in sion, and that distinguishes between anger as an order to develop more accurate means of anger emotional state and hostility in the experience, assessment, it is advisable to use and develop expression and control of anger as personality lesser known techniques of behavioural observa- traits, is essential for guiding the construction of tion, such as self-monitoring (e.g. Meichenbaum anger measures and cross-cultural adaptation. & Deffenbaker, 1988) and interviewing. Also, In the cross-cultural adaptation of anger research in the fields of physiological measure- measures, it is essential to have equivalent ment and cognitive variables of anger (appraisals, conceptual definitions in the source and target attributions etc.) needs to be given a further languages that distinguish between the experience boost. Measurement issues are a fundamental of anger as an emotional state, and hostility in the part of the research and the study of the hostility expression and control of anger as personality and the anger.

Table 1. Summary table of anger assessment scales Scales Assessment of Anger Expression of Anger- Anger- Anger- Hostility Anger In Out Control STAXI Yes Yes Yes Yes Yes Novaco Anger Inventory Yes Multi-dimensional Anger Inventory Yes Yes Yes Yes Harburg Anger-in /Anger-out Yes Yes Yes Anger Self-Report Scale Yes Yes Anger Control Inventory Yes Framingham Anger Scale Yes Yes Yes Yes Subjective Anger Scale Yes Anger Situation Scale Yes Anger Symptom Scale Yes Cook–Medley Ho Yes Buss–Durkee Hostility Inventory Yes Factor L Yes

[8.8.2002–12:29pm] [1–128] [Page No. 26] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Anger, Hostility and Aggression Assessment 27

References State-Trait Anger Expression Inventory. Interamer- ican Journal of Psychology, 33(2). Novaco, R.W. (1975). Anger Control: The Develop- Barefoot, J.C., Dodge, K.A., Peterson, B.L., Dahistrom, ment and Evaluation of an Experimental Treatment. W.G. & Williams, R.B. (1989). The Cook-Medley Lexington: D.C. Health hostility scale: item content and ability to predict Novaco, R.W. (1977). Stress inoculation: a cognitive survival. Psychosomatic Medicine, 51, 46–57. therapy for anger and its application to a case of Biaggio, M.K., Suplee, K. & Curtis, N. (1981). depression. Journal of Consulting and Clinical Reliability and validity of four anger scales. Journal Psychology, 45, 600–608. of Personality Assessment, 45, 639–648. Shekelle, R.B., Gale, M., Ostfeld, A.M. & Paul, O. Buss, A.H. & Durkee, A. (1957). An inventory for (1983). Hostility, risk of coronary heart disease, and assessing different kinds of hostility. Journal of mortality. Psychosomatic Medicine, 45, 109–114. Consulting Psychology, 42, 155–162. Siegel, S.M. (1985). The multidimensional anger Cattell, R.B., Eber, H.W. & Tatsuoka, M.M. (1970). inventory. In Chesney, M.A. & Rosenman, N.H. Handbook for the Sixteen Personality Factor (Ed.), Anger and Hostility in Cardiovascular Questionnaire (16PF). Champaign, IL: Institute for and Behavioural Disorders. Washington, DC: Personality and Ability Testing. Hemisphere. Cook, W.W. & Medley, D.M. (1954). Propose hostility Spielberger, C.D., (1999). State-Trait Anger Expression and pharasaic-virtue scales for the MMPI. Journal of Inventory – 2. Odessa, FL: Psychological Assessment Applied Psychology, 38, 414–418. Resources. Deffenbacher, J.L., Demm, P.M. & Brandon, A.D. Spielberger, C.D., Krasner, S.S. & Solomon, E.P. (1986). High general anger. Behaviour Research and (1988). The experience, expression and control of Therapy, 24, 481–489. anger. In Janisse, M.P. (Ed.), Health Psychology: Dembroski, T.M., McDougall, J.M., Williams, R.B. & Individual Differences and Stress (pp. 89–108). New Haney, T.L. (1984). Components of type A, York: Springer Verlag. hostility, and anger-in: relationship to angiographic Spielberger, C.D., Johnson, E.H., Russell, S.F., Crane, findings. Psychosomatic Medicine, 47, 219–233. R.J., Jacobs, G.A. & Worden, T.J. (1985). The Harburg, E., Erfurt, J.C., Chape, C., Hauenstein, L.S., experience and expression of anger: construction and Schull, W.J. & Schork, M.A. (1973). Socio-ecological validation of an anger expression scale. In Chesney, stressor areas and black-white blood pressure: Detroit. M.A. & Rosenman, R.H. (Eds.), Anger and hostility in Journal of Chronic Disease, 26, 595–611. Cardiovascular and Behavioural Disorders. New Haynes, S.N., Levine, S., Scotch, N., Feinleib, M. & York: McGraw-Hill/Hemisphere. Kannel, W.B. (1978). The relationship of psycho- Spielberger, C.D., Jacobs, G.A., Russell, S.F. & Crane, logical factors to coronary heart disease in the R.S. (1983). Assessment of anger: the State-Trait Framingham study: I. Methods and risk factors. Anger Scale. In Butcher, J.N. & Spielberger, C.D. American Journal of Epidemiology, 107, (Eds.), Advances in Personality Assessment (Vol. 2, 362–363. pp. 159–187). Hillsdale, NJ: Erlbaum. Hoshmand, L.T. & Austin, G.W. (1987). Validation Sydeman, S.J. (1995). The Control of Suppressed studies of a multifactor cognitive-behavioural Anger Anger. Unpublished Master’s Thesis, University of Control Inventory. Journal of Personality Assess- South Florida, Tampa. ment, 51, 417–432. Williams, R.B., Barefoot, J.C. & Shekelle, R.B. (1985). Knight, R.G., Ross, R.A., Collins, J.I. & Parmenter, The health consequences of hostility. In Chesney, S.A. (1985). Some norms, reliability and preliminary M.A. & Rosenman, R.A. (Eds.), Anger and Hostility validity data for an S-ROS: inventory of anger: the in Cardiovascular and Behavioural Disorders Subjective Anger Scale (SAS). Personality and (pp. 173–185). New York: Hemisphere/McGraw- Individual Differences, 6, 331–339. Hill. Lakoff, G. (1987). Women, Fire, and Dangerous Zelin, M.I., Adler, G. & Myerson, P.G. (1972). Anger Things: What categories reveal about the mind. self-report: an objective questionnaire for the Chicago: The University of Chicago Press. measurement of aggression. Journal of Consulting Meichenbaum, D.H. & Deffenbacher, J.L. (1988). and Clinical Psychology, 39, 340. Stress inoculation training. The Counselling Psychol- ogist, 16, 69–90. Manolete S. Moscoso and Miguel-Tobal, J.J., Cano-Vindel, A., Casado, M.I. & Spielberger, C.D. (2001). Inventario de Expresio´ nde Miguel Angel Pe´ rez-Nieto Ira Estado Rasgo – STAXI – 2: Spanish Adaptation. Madrid: TEA Moscoso, M.S. & Spielberger, C.D. (1999a). Evalua- cio´ n de la experiencia, expresio´ n y control de la Related Entries co´ lera en Latinoamerica. Revista Psicologı´a Con- tempora´nea, 6(1), 4–13. Moscoso, M.S. & Spielberger, C.D. (1999b). Measur- Type A: A Proposed Psychosocial Risk Factor ing the experience, expression, and control of anger for Cardio-Vascular Diseases, Dangerous/ in Latin America: the Spanish multi-cultural Violence Potential Behaviuor

[8.8.2002–12:29pm] [1–128] [Page No. 27] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 28 Antisocial Disorders Assessment

ANTISOCIAL DISORDERS A ASSESSMENT

INTRODUCTION A variety of methods are used for assessing antisocial disorders, these include: self-report Using a broad definition, Antisocial Disorders may instruments, others’ ratings, clinical interviews be defined as pervasive, maladaptive behaviours (structured and semi-structured), and direct that violate the norms and rules of a group or behavioural observation (see Table 1). society, causing social impairment or distress to others. Currently, the classification and assessment of antisocial disorders may follow: (a) the medical CHILD AND ADOLESCENT model or (b) the dimensional model: ASSESSMENT

. The medical model uses a categorical In order to establish the severity of antisocial approach in which the presence of a variety behaviours during childhood and adolescence, it is of diagnostic criteria, such as persistent important: (a) to determine the age of onset; (b) to violations of social norms (including lying, evaluate the frequency of aggressive acts; (c) to stealing, truancy, inconsistent work beha- establish the variety of antisocial behaviours; and viour and traffic arrests) is evaluated by (d) to observe them in multiple settings (family, experts (clinicians). This model relies on peers, school and community). As a necessary diagnostic criteria as outlined in the DSM- complement to this assessment, it is also important IV (Diagnostic and Statistic Manual of to evaluate other aspects of the individual’s Mental Disorders, 1994) and ICD (Interna- functioning in order to rule out the co-occurrence tional Classification of Diseases, 1993). of other psychological disturbances. . The dimensional model evaluates antisocial For children and adolescents, the terms conduct disorders along a continuum of develop- disorders and conduct problems (aggressive and ment, from normal to pathological, focusing oppositional behaviours) may be used interchange- on behavioural and trait dimensions, and ably. It is important to note that conduct disorders identifying clusters of highly interrelated have different prevalence rates for boys and girls: 6 behaviours and traits. to 16% for boys, and 2 to 9% for girls. There is agreement among researchers about the In recent years, more complete assessment development of antisocial behaviour: it begins early procedures have been developed to cover a full in life (infancy) with aggressive and oppositional range of childhood and adolescent behaviours behaviours (e.g. conduct problems), gradually directly and indirectly linked to antisocial advances toward more significant expressions of behaviours in different contexts. The advantages antisocial acts (e.g. vandalism, stealing, truancy, of these assessment procedures are: (a) to have a lying, substance abuse) during adolescence, and complete picture of child and adolescent func- lastly, progresses to extreme forms of delinquency tioning for the purpose of differential diagnosis in adult life. The most recent longitudinal and and (b) to collect data to provide empirical and retrospective studies (Patterson, Reid & Dishion, theoretical support of the instruments used. 1992) suggest that the ‘early starters’ (childhood and preadolescence) are at greater risk for adult Instruments For Child and involvement in delinquent acts and are more likely Adolescent Assessment to move toward more serious offences that lead to a ‘criminal career’ compared to the ‘later starters’ Here we present only a few of the numerous (adolescence). instruments that can be used for measuring

[8.8.2002–12:29pm] [1–128] [Page No. 28] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 8820–22p][–2][aeN.2]FRTPOF BosBletrsBletrsA3 ae:BletrsAKeyword Ballesteros-A Paper: {Books}Ballesteros/Ballesteros-A.3d PROOFS FIRST 29] No. [Page [1–128] [8.8.2002–12:29pm]

Table 1. Assessment of antisocial disorder Target Informant Model Measure Scale Clincal Interviews Observation Self Dimensional – Youth Self-Report (Achenbach & Edelbrock, 1991c) – Behaviour Assessment System of Children- Self-report of Personality Scale (Reynolds & Kamphaus, 1992) Children/ Others Dimensional – Revised Behavior – Direct Observation Adolescents Problem Checklist (Quay Form (Achenbach, 1986) & Peterson, 1983) – Child Behavior Checklist – Behavior Assessment (Achenbach, 1991a) System of Children – Student observation system (Reynolds & Kamphaus, 1992)

– Teacher Report Form – Family Interaction Coding Assessment Disorders Antisocial (Achenbach, 1991b) System (Reid, 1978) – Behavior Assessment – Observation of Peer System of Scale (Reynolds Interactions (Dodge, 1983) & Kamphaus, 1992) Children/ Self Categorical – Devereux Scales of Mental Adolescents Disorders (Naglieri, Lebuffe & Pfeiffer, 1984) – Eysenck Personality Questionnaire (Eysenck & Eysenck, 1993) – Minnesota Multiphasic Personality Inventory – II (Butcher et al., 1989)

(continued) 29 8820–22p][–2][aeN.3]FRTPOF BosBletrsBletrsA3 ae:BletrsAKeyword Ballesteros-A Paper: {Books}Ballesteros/Ballesteros-A.3d PROOFS FIRST 30] No. [Page [1–128] [8.8.2002–12:29pm] 30 nioilDsresAssessment Disorders Antisocial

Table 1. Continued Target Informant Model Measure Scale Clincal Interviews Observation Adults Self Dimensional – Millon Clinical Multiaxial – Hare Psychopathy Inventory II (Millon, 1994) Checklist – Revised (Hare, 1991) – Assessment of DSM-IV Personality Disorder Questionnaire – Antisocial Personality Questionnaire (Blackburn & Fawcett, 1999) Categorical – International Personality Disorder Examination (Loranger, Sartorius & Janca, 1996). – Structured Clinical Interview for DSM-IV (First et al., 1997) – Structured Interview For DSM-IV Personality Disorder (Pfhol, Blum & Zimmerman, 1995) Antisocial Disorders Assessment 31

antisocial behaviour. We included those that Behaviour Assessment System of Chil- provide a comprehensive assessment of different dren (BASC) psycho-social domains and those that are in some way representative of the field of antisocial behav- The BASC (Reynolds & Kamphaus, 1992) is a iour, both at the level of research and intervention. multi-method, multidimensional assessment instru- ment aimed at evaluating the behaviours and self- of children aged 4 to 18 years old. Revised Behaviour Problem Checklist Similar to the CBCL, it has several different (RBPC) versions: self-report, teacher rating scale, parent rating scale, student observation system and The RBPC (Quay & Peterson, 1983) represents one structured developmental history. The Self-Report of the first attempts to empirically classify child- of Personality Scale (6–18 years) is comprised of the hood and adolescent disorders. The Revised following subscales: Anxiety, Attitude to School, Behaviour Problem Checklist covers the ages 5 to Attitude to Teachers, Atypicality, Depression, 17 years, and is available in two versions, one for Interpersonal Relations, Locus of Control, teachers and one for mothers. It represents a Relations with Parents, Self-Esteem, Self-Reliance, revision of the original Behaviour Problems Check- Sensation Seeking, Sense of Inadequacy, Social list and now comprises six scales: Conduct Stress, and Somatization. The Teacher and Parent Disorder, Socialized Aggression, Attention Rating Scales (different forms for 4–5 years, 6–11 Problems-Immaturity, Anxiety-Withdrawal, Psy- years, and 12–18 years) are comprised of the chotic Behaviour, Motor Tension Excess. It allows following subscales: Aggression, Conduct Pro- one to distinguish between ‘‘socialized’’ and blems, Attention Problems, Hyperactivity, Anxiety, ‘‘under-socialized’’ conduct disorders. Socialized Atypicality, Depression, Somatization, Withdra- makes reference to antisocial behaviour within wal, Learning Problems, Leadership, Social Skills, deviant peer group, unsocialized refers to impul- Study Skills, Adaptability. The Student Observa- sivity and irritability. tion System assesses student’s behaviour in the classroom such as inappropriate movement, inap- propriate attention and work on school subjects. Child Behaviour Checklist (CBCL) The CBCL (Achenbach, 1991a) (parent form), together with Youth Self-report (YSR; Devereux Scales of Mental Disorders Achenbach, 1991c), Teacher Report Form (TRF; (DSMD) Achenbach, 1991b) and Direct Observation Form The DSMD (Naglieri, Lebuffe & Pfeiffer, 1994) is (DOF; Achenbach, 1986), is one of the most designed to measure the risk for emotional and comprehensive evaluation systems for childhood behavioural disorders in children between 5 and 18 and adolescent psychopathology. It was devel- years (5–12 years; 13–18 years). It relies on the oped by Achenbach in order to derive syndromes DSM-IV, and has both teacher and parent forms. It empirically and to allow for comparisons among includes scales to assess Problem behaviours, different informants and cultures. The four forms Delinquency, Attention, Depression and Anxiety, share item content and can be used together to Autism and Acute Problems. It provides three establish cross-contexts consistency. different composites: Internalizing, Externalizing They cover an age range of 4 to 18 years. The and Critical Pathology. CBCL includes problem behaviour and social com- petence scales. Problem behaviour scales are: Aggressive behaviours, Delinquency, Anxiety/ Diagnostic Interview Schedule for Depression, Somatic Complaints, Attention pro- blems, Thought Problems and Social Withdrawal) Children – Child Interview (DISC-C) and a social competence scale. In addition, there is The DISC-C (Costello, Edelbrock, Kalas, Kessler a Sexual Problem Behaviour scale for children & Klaric, 1982) is a structured diagnostic between 4 and 11 years old. It is also possible to interview that covers a broad range of DSM- derive two broader dimensions: Internalizing and IV diagnoses in children. Child, parent and Externalizing. teacher forms are available. Areas covered

[8.8.2002–12:29pm] [1–128] [Page No. 31] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 32 Antisocial Disorders Assessment

include: Behaviour/Conduct Disorder, Attention disorders may be informative for differential Deficit Disorder, Affective/Neurotic Anxiety, diagnosis and treatment purposes. Psychopathy Fears and Phobias, Obsessive-Compulsive Dis- corresponds partially to the criteria of APD, but order, Schizoid/Psychotic Disorders, Affective also includes emotional/interpersonal characteris- (depression) Disorders. tics such as glibness, superficiality, egocentricity, grandiosity, lack of empathy, manipulativeness, Family Interaction Coding System and shallow emotions. When assessing antisocial disorders, it is also important to evaluate the co- (FICS) occurrence of substance abuse, anxiety disorders The FICS (Reid, 1978) is an assessment instru- and depression. ment used to register interactions between family members. This coding system enables researchers Instruments For Adult Assessment and family therapists to monitor clinical cases, systematically assess the outcome of family As for children and adolescents, numerous intervention programs, and builds a database instruments have been developed for the assess- for studying aggressive antisocial behaviours ment of adult antisocial disorders. We have exhibited by children. It is composed of 29 selected to present instruments that combine categories, but the Total Aversive Behaviour personality assessment (dimensional model) with score (such as physical negative, tease, noncom- classic diagnostic assessment (medical or catego- pliance, destructiveness etc.) is mostly used (Reid, rical model), including interviews, checklists and 1978). questionnaires aimed at identifying the criteria for Antisocial Personality Disorders as presented in the DSM-V and the ICD-10. Observation of Peer Interactions This instrument (Dodge, 1983) is used to register Eysenck Personality Questionnaire interactions among peers between the ages of 5 to 8 years. It has five categories: solitary active, (EPQ-R) interactive play, verbalizations, physical contacts The EPQ-R (Eysenck & Eysenck, 1993) is with peers, and interactions with adult leaders designed to measure the three traits of Eysenck’s within the group. This system is associated with personality model: Extraversion (E), dimensions of social status (rejection, popularity), (N) and Psychoticism (P). This model links types, and therefore may be useful to obtain a more traits and behaviour into a hierarchical system. complete assessment of peer interactions. The P trait is the primary trait implicated in the development of antisocial behaviour, with eleva- tions on E and N being secondary. In serious ADULT ASSESSMENT antisocial behaviour, the P trait has a primary role. When E is combined with high P, poor Albeit with some differences, antisocial disorders impulse control and a weakened association may correspond with the Antisocial Personality between behaviour and its consequences will Disorder (APD) classification of DSM-IV and the exacerbate the P trait predisposition. Elevated E Dissocial Personality Disorder classification of is more frequent among juvenile delinquents, and ICD-10. APD is characterized by criminal and elevated N appears in adult criminals. The antisocial behaviour, and also by deceitfulness, lack Eysenck Personality Inventory is also available in of remorse, disregard for the safety of others (DSM- a form for adolescents. IV, 1994), low tolerance for frustration and a low threshold for discharge of aggression (ICD-10, Minnesota Multiphasic Personality 1993). The emphasis is placed on a failure to Inventory – II (MMPI-II) conform to social norms, and on impulsivity and irresponsibility. Although it was excluded from The MMPI-II (Butcher et al., 1989) is the most recent classifications of mental disorders, the frequently used clinical test. It is the revised assessment of Psychopathy in adults with antisocial version of the MMPI. It was originally intended

[8.8.2002–12:29pm] [1–128] [Page No. 32] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Antisocial Disorders Assessment 33

for use with an adult population. The MMPI-II case-history information and specific diagnostic has 10 clinical scales, 3 validity scales and 15 criteria to provide a reliable and valid estimate content scales. The clinical scales are Hypochon- of the degree to which an offender or forensic driasis, Depression, Hysteria, Psychopathic psychiatric patient matches the traditional Deviate, Masculinity–Femininity, Paranoia, Psy- (prototypical) conception of a psychopath. The chasthenia, Schizophrenia, Hypomania, and PCL-R evaluates emotional and interpersonal Social Introversion. The clinical scales do not characteristics of psychopathy and social discriminate clinical groups from normal groups, deviance. as the labels might suggest. Subjects who score high on specific scales, show particular beha- viours and tendencies. For example, subjects Millon Clinical Multiaxial Inventory II scoring high on the Psychopathic Deviate Scale (MCMI-II) show disregard for social custom, shallow The MCMI-II (Millon, 1994) is composed of 24 emotions, and an inability to learn from self-administered scales, and is designed to experience. Content scales include internalizing measure 14 personality styles, grouped into (a) symptoms (somatic disorder, strange beliefs and Clinical Personality Patterns (schizoid, avoidant dysfunctional ways of thinking), aggressive (depressive), dependent, histrionic, narcissistic, tendencies (dysfunctional control of behaviour, antisocial (sadistic), compulsive and negativistic cynicism), low self-esteem, family problems, work (masochistic)) and (b) Severe Personality interference and negative treatment indicators. Pathology (schizotypal, borderline and para- The content scales offer behavioural descriptions noid). The instrument was developed to that are easier to interpret than the clinical scales. match the DSM-IV personality disorder classifi- The interpretation of subject profiles must be cations. It also comprises 10 scales measuring done by experienced clinicians. Recently, an other clinical syndromes (such as anxiety, adolescent version has been developed. depression, drug-dependence and thought dis- orders). This instrument also has an adolescent Antisocial Personality Questionnaire version. (APQ) The APQ (Blackburn & Fawcett, 1999) is a International Personality Disorder recently developed, short, multi-trait, self-report Examination (IPDE) inventory aimed at measuring intrapersonal and interpersonal aspects of emotional dysfunction, The IPDE (Loranger, Sartorius Janca, 1996) is a impulse control, deviant beliefs about the self and semi-structured interview designed for the assess- others, and interpersonal problem behaviours ment of both DSM-IV and ICD-10 Personality related to antisocial behaviours. It was derived Disorders (PD). The IPDE also combines the from another instrument previously developed for categorical and dimensional models. Questions mentally disordered offenders. It comprises the are arranged in sections (e.g. background infor- following measures: Self-Control, Self-Esteem, mation, work, self, interpersonal relationships). Avoidance, Paranoid Suspicion, Resentment, Aggression, Deviance and Extraversion. It is Other Clinical Interview for DSM-IV possible to derive two second-order scales: Hostile-Impulsivity and Social Withdrawal. These The most frequently used clinical interviews for the two scales reflect orientations towards others and diagnosis of Antisocial Personality Disorder are: the self, respectively. . Structured Clinical Interview for DSM-IV IV(Scid II; First et al., 1997) Hare Psychopathy Checklist – Revised The SCID II is a semi-structured diagnostic (PCL-R) interview organized by disorder which includes all DSM-IV personality disorders. PCL-R (Hare, 1991) is a single construct rating A computerized administration and scoring scale that uses a semi-structured interview, program is available.

[8.8.2002–12:29pm] [1–128] [Page No. 33] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 34 Antisocial Disorders Assessment

. Structured Interview for DSM-IV (SIDP-IV) Butcher, J.N., Dahlstrom, W.G., Graham, J.R., Telle- Personality Disorder (Pfhol, Blum & gen, A. & Kaemmer, B. (1989). Minnesota Multi- phasic Personality Inventory-2 (MMPI-2), Manual Zimmerman, 1995) for Administration and Scoring. Minneapolis: The SIDP-IV consists of 160 questions University of Minnesota Press. Costello, A.J., Edelbrock, C.S., Kalas, R., Kessler, M. grouped under 16 thematic sections, such as & Klaric, S. (1982). Diagnostic Interview Schedule relationships, emotions and reactions to for Children-Child Interview. Bethesda, MD: stressful situations. Questions are asked National Institute of Mental Health. regarding behaviours in the last 5 years. Dodge, K. (1983). Behavioural antecedents of peer social status. Child Development, 54, 1386–1399. Eysenck, H.J. & Eysenck, S.B.G. (1993). Eysenck Personality Questionnaire – Revised. San Diego: FUTURE PERSPECTIVES AND Educational and Industrial Testing Service. CONCLUSIONS First, M.B., Gibbon, M., Spitzer, R.L. & Williams, J.B.W. (1997). User’s Guide for the Structured Clinical Interview for DSM-IV Axis II Personality Disorders. The assessment and diagnosis of antisocial dis- Washington, DC: American Psychiatric Press. orders should be done by experienced mental Hare, R.D. (1991). Manual for the Hare Psycho- health professionals. The assessment process pathy Checklist – Revised. Toronto: Multi should include multiple methods and informants, Health System. and use standardized instruments or structured Loranger, A.W., Sartorius, N. & Janca, A. (Eds.) (1996). Assessment and Diagnosis of Personality diagnostic interviews, including complete informa- Disorders: The International Personality Disorder tion related to the ecology of the individual (family Examination (IPDE). New York, NY: Cambridge and social context) and individual functioning. University Press. Based on the most relevant clinical research in Millon, T. (1994). Manual for the Millon Clinical the area of antisociality, we may conclude that in Multiaxial Inventory – III. Minneapolis: National Computer Systems. the future the assessment must focus more on both Naglieri, J.A., Lebuffe, P.A. & Pfeiffer, S.I. (1994). dysfunction and skills and try to integrate the two Devereux Scales of Mental Disorders. New York: models, dimensional and categorical, in order to The Psychological Corporation. better direct the diagnostic process (screening, Patterson, G.R., Reid, J.B. & Dishion, T.J. (1992). identification and placement for intervention). Antisocial Boys: A Social Interactional Approach, Vol. 4. Eugene, OR: Castalia. Pfohl, B, Blum, N, Zimmerman, M (1995). Structured Interview for DSM-IV Personality (SIDP-IV). Iowa References City, IA: The University of Iowa. Quay, H.C. & Peterson, D.R. (1983). Interim Manual American Psychiatric Association (1994). Diagnostic for the Revised Behaviour Problem Checklist. Coral and Statistical Manual of Mental Disorders (4th ed.). Gables, FL: Author. Washington, DC: APA. Reid, J.B. (1978). A Social Learning Approach, Achenbach, T.M. (1986). Manual for the Child Observation in Home Settings. Vol. 2: Eugene, Behaviour Checklist – Direct Observation Form. OR: Castalia Publishing Company. Burlington, VT: University of Vermont Department Reynolds, C.R. & Kamphaus, R.W. (1992). BASC of Psychiatry. Manual. Circle Pines, MN: American Guidance Achenbach, T.M. (1991a). Manual for the Child Service. Behaviour Checklist/4–18 and 1991 Profile. Bur- World Health Organization. (1993). The ICD-10 lington (VT): University of Vermont Department of Classification of Mental and Behavioural Disorders: Psychiatry. Diagnostic Criteria for Research. Geneva: WHO. Achenbach, T.M. (1991b). Manual for Teacher Report’s Form and 1991 Profile. Burlington (VT): Concetta Pastorelli and Maria Gerbino University of Vermont Department of Psychiatry. Achenbach, T.M. (1991c). Manual for the Youth Self- Report and 1991 Profile. Burlington (VT): University of Vermont Department of Psychiatry. Blackburn, R. & Fawcett, D. (1999). The Antisocial Related Entries Personality Questionnaire: an inventory for assessing personality deviation in offenders. Eur- Applied Fields: Clinical, Dangerous/Violence opean Journal of Psychological Assessment, 14–24. Potential Behaviour

[8.8.2002–12:29pm] [1–128] [Page No. 34] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Anxiety Assessment 35

A ANXIETY ASSESSMENT

INTRODUCTION of anxiety. It is important to first distinguish between state anxiety and trait anxiety. The first assessment of individual differences is reported in the Bible in the Book of Judges, State vs. Trait Anxiety Chapter 7 on Gideon. God asked Gideon, who was battling the Midianites, to thin out his troops State anxiety is the momentary experience of by rejecting individuals who were both fearful anxiety. Trait anxiety is a predisposition or and afraid of battle. However, too many men proneness to be anxious. The distinction between were left, so God instructed Gideon to lead his state and trait anxiety was first suggested by men down to the water and used the following Cicero (Before the Common Era). Spielberger selection procedure. Out of 10,000 persons, 300 (1983) suggested that conceptual clarity could lapped with water from their hands, with their be achieved in the anxiety literature by distin- tongues. They were selected. The ones who knelt guishing between state and trait anxiety. There to drink were not. are various methods to assess state anxiety. The The present chapter will focus on the assess- assessment of trait anxiety has been conducted ment of anxiety as an individual differences primarily through the use of self-report measures. variable; the dimensional conceptualization of anxiety. Dimensionality arises from a personality Multidimensionality of State and psychology tradition, in which traits and beha- Trait Anxiety viours are measured psychometrically. Traits are viewed as existing on a continuum, with low Trait anxiety and state anxiety are both multi- levels of a trait (e.g. anxiety) at one end and high dimensional constructs (Endler, 1997; Endler, levels of the trait at the opposite end of the same Edwards, & Vitelli, 1991). There are at least six continuum. In contrast to the dimensional facets of trait anxiety; social evaluation, physical approach is the typological or categorical danger, ambiguous, self-disclosure, separation conceptualization of anxiety, consistent with the and daily routines; and two facets of state medical model (Endler & Kocovski, in press). anxiety; cognitive-worry and autonomic Another entry in this encyclopaedia covers the emotional (Endler & Flett, 2001). These assessment of anxiety disorders. facets of state and trait anxiety are presented in Table 1.

Definition of Anxiety Interaction Model of Anxiety Anxiety has been conceptualized as a stimulus, as The distinction between state and trait anxiety a trait, as a motive, and as a drive and has been has achieved wide recognition in the interaction defined ‘as an emotional state, with the subjectively experienced quality of fear as a Table 1. Anxiety assessment techniques closely related emotion’ (Lewis, 1970: 77). Lewis notes that the emotion is unpleasant, future- Anxiety Assessment Technique oriented, disproportional to the threat and State anxiety Self-report includes both subjective and manifest bodily Behavioural disturbances. There are physiological, cognitive, Cognitive and behavioural components to anxiety. These Physiological Trait anxiety Self-report give rise to the various methods of the assessment

[8.8.2002–12:29pm] [1–128] [Page No. 35] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 36 Anxiety Assessment

model of anxiety, a subset of the interaction despite the fact that personality theory also model of personality (Endler, 1997). According refers to observable behaviours. Self-report to the interaction model, increases in state questionnaires have the following advantages: anxiety will result only when a situational they are easy to administer, results are easy to stressor is congruent with the facet of trait analyse, results can be compared to normative anxiety under investigation. Over 80% of the data, and results can be subjected to factor tests of the multidimensional interaction model of analytic techniques (as well as other advanced anxiety have yielded support for the model statistical techniques). (Endler, 1997). Commonly used self-report measures are presented in Table 2. One of the first self-report Assessment Techniques anxiety measures is the Taylor Manifest Anxiety Scale (Taylor, 1953). Since then, numerous other The use of questionnaire measures has been the scales have been developed. One commonly used primary assessment technique for trait anxiety. self-report measure of anxiety is the State-Trait There are multiple techniques that can be used Anxiety Inventory (STAI; Spielberger, 1983). The for the assessment of state anxiety. The STAI assesses both state and trait anxiety as assessment techniques are shown in Table 1 unidimensional constructs. The state and the trait and include self-report, behavioural, cognitive, scales consist of 20 items each. These scales have and physiological measures. The most compre- been shown to have high internal consistency hensive method of assessing state anxiety is (approximately 0.90 for both the state and trait through a combination of the available techni- scales) and test–retest validity for the trait scale ques as there are individual differences in the (Spielberger, 1983). experience of anxiety. The Endler Multidimensional Anxiety Scales (EMAS) assess both state anxiety and trait anxiety as multidimensional constructs and SELF-REPORT MEASURES assess the of the situation (Endler, Edwards, & Vitelli, 1991). Cognitive-worry and The majority of research in the area of autonomic-emotional are the two components of personality is based on self-report measures, state anxiety assessed by the EMAS-State

Table 2. Self report measures of anxiety Name of scale Author/year Psychometric properties Anxiety Sensitivity Index Reiss et al. (1986) Alpha reliability = 0.88; Test–retest reliability ranges from 0.75 to 0.85 (2 week interval) Beck Anxiety Inventory Beck et al. (1988) Alpha reliability = 0.92; test–retest reliability = 0.75 (1 week interval) Endler Multidimensional Endler et al. (1991) Alpha reliabilities range Anxiety Scales (EMAS) from 0.89 to 0.95; test–retest reliabilities for the trait scales range from 0.60 to 0.79 (2 week interval) EMAS-Social Anxiety Endler & Flett (2001) Alpha reliabilities range Scales (EMAS-SAS) from 0.92 to 0.93; test–retest reliabilities range from 0.69 to 0.77 (1 week interval) State Trait Anxiety Spielberger (1983) Alpha reliabilities range Inventory (STAI) from 0.91 to 0.93; test–retest reliabilities range from 0.71 to 0.75 for the trait scale (30 day interval) Taylor Manifest Taylor (1953) Test–retest reliability = 0.88 Anxiety Scale (4 week interval)

[8.8.2002–12:29pm] [1–128] [Page No. 36] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Anxiety Assessment 37

measure (20 items in total). The EMAS-Trait Types of interaction used in behavioural obser- measures assess a predisposition to experience vation can be classified as either artificial (i.e. a anxiety in the following four situational role-play situation) or naturalistic (i.e. in vivo domains (15 items each): social evaluation, observation; Glass & Arnkoff, 1989). Behaviours physical danger, ambiguous, and daily routines. are often recorded in role-play situations due to the Recent research has resulted in the addition of impracticality of rating people in naturalistic the following two situational domains: self- environments. Even within the naturalistic cate- disclosure (to family or to friends) and separation gory, waiting-room type interactions are often used anxiety (Endler & Flett, 2001). The alpha (especially for the assessment of social anxiety). reliabilities of these measures have been found Behavioural observation techniques are less to be highly acceptable (ranging from 0.89 to subjective on the part of the examinee than the 0.95; Endler et al., 1991). Numerous studies have use of self-report measures. However, the presence been conducted which have found support for the of the examiner in an evaluative role may affect the validity of the EMAS-State, Trait, and Perception level of anxiety, and additionally, the examiner is scales (Endler et al., 1991; see Endler, 1997 for responsible for determining whether the examinee’s a review). actual behaviour constitutes the behaviour being Another self-report instrument commonly assessed. Furthermore, in an interaction type used to assess anxiety is the Beck Anxiety behavioural observation assessment, the behaviour Inventory (BAI; Beck, Epstein, Brown & Steer, of the partner (or confederate) may represent a 1988). The BAI consists of 21 items representing confound. The partner may respond differently to two factors: somatic symptoms and subjective different participants depending on variables such anxiety symptoms. It has been shown to have a as the social skill level of the participant (Glass & high internal consistency (alpha ¼ 0.92). A weak- Arnkoff, 1989). Despite these criticisms, beha- ness is that the BAI does not distinguish between vioural assessment techniques for performance state and trait anxiety. Respondents are asked to situations have been shown to be highly reliable. report the degree to which they have been bothered by the symptoms assessed over the past week. The BAI is primarily used in clinical settings. Finally, the COGNITIVE MEASURES Anxiety Sensitivity Index consists of 16 items and assesses the fear of experiencing anxiety (Reiss, Anxiety also has a cognitive component. Cognitive Peterson, Gursky & McNally, 1986). measures examine the thoughts an individual has. This can be done through thought-listing proce- dures (Cacioppo & Petty, 1981) or via a BEHAVIOURAL MEASURES questionnaire approach. Thought-listing techni- ques ask participants to record thoughts in paper Another anxiety assessment technique is the and pencil format while they are in an anxious measurement of various behaviours. The pre- situation (Cacioppo & Petty, 1981). Participants sence and frequency of certain behaviours are are asked not to concern themselves with spelling rated by others (e.g. clinicians, experimenters). A or grammar and not to edit the thoughts as they review of ratings by others for the purposes of arise. The list of thoughts is then analysed clinical evaluation is beyond the scope of the according to such indices as content or frequency. present chapter. The behaviours used to repre- Variations of this technique include: (i) having sent an indication of the level of anxiety an participants state their thoughts aloud rather than individual is experiencing depend upon the recording them and (ii) having participants watch a situational domain. For example, behavioural video of their performance and state their thoughts measures of social anxiety include measurement during the viewing. of the maintenance of eye contact, the number of conversations initiated or amount spoken during a social encounter, hand tremors, and PHYSIOLOGICAL MEASURES fidgeting (Leary, 1986). Not all of these behavioural measures are relevant for other Anxiety has a physiological component, which is situational domains. largely determined by the septo-hippocampal

[8.8.2002–12:29pm] [1–128] [Page No. 37] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 38 Anxiety Assessment

system (behavioural inhibition system; Gray, are innervated by the sympathetic nervous system 1996), thus allowing for the assessment of anxiety and are located throughout the surface of the through physiological means. Among the physio- body. The primary concentration of eccrine sweat logical measures are the measurement of heart rate, glands is in the palms of the hands and the soles electrodermal activity, and respiration. Addition- of the feet. Changes in the degree of sweat ally, blushing, assessed with a photoplethysomo- gland activity can be a result of state anxiety, graph, has been used to assess social anxiety. The however, there are other variables that also play different physiological measures do not, however, a role. For example, room temperature affects the correlate well with one another or with self-report activity of the eccrine glands, as do person measures (Leary, 1986). variables (e.g. gender). There are, therefore, numerous variables that can affect the internal validity of a study that uses sweat gland activity Heart Rate as an indication of anxiety as the dependent variable. These need to be considered both in Heart rate is the most commonly used physio- research studies and in the assessment of an logical measure of anxiety. It is assessed either individual. via electrodes (which can be attached to the Clements and Turpin (1996) assessed the sweat patient’s skin to the right and left of the gland activity of participants while giving a sternum) or via sensors. The unit of measure- presentation and while being a member of the ment typically used is the number of beats per audience. Sweat gland activity was found to minute. This can be determined by (i) counting increase prior to and during the presentation and the number of beats per minute or, alternatively, decrease upon completion of the presentation. (ii) using equipment to determine the length of Levels of state anxiety were also found to be the interval between heart beats and then elevated during the presentation. There was, calculating beats per minute based on that however, no relationship found between the figure. These two approaches typically yield physiological measure (sweat gland activity) and different results; however, both are used in the each of state and trait anxiety. assessment of heart rate as an indicator of state anxiety. Heart rate has been found to be strongly correlated with self-report state anxiety in a competitive sports situation and moderately Respiration correlated with self-report state anxiety (and one Respiration rate can also be used as a tool in the item in particular which assesses heart rate) in a assessment of anxiety. To measure respiration performance anxiety situation (Kantor, Endler, rate, a stretchable device attached to an equip- Heslegrave & Kocovski, 2000). ment capable of measuring strain, is placed around the chest and the abdomen. Respiration Finger Pulse Volume rate has been shown to be positively related to self-reported anxiety. Finger pulse volume is a measure of digital Correlations among the various physiological vasoconstriction (Bloom & Trautt, 1977). The measures of anxiety are generally found to be use of finger pulse volume to assess anxiety is low. There are many factors that can account based on the premise that one of the responses of for this difference, including individual differ- the sympathetic nervous system is decreased ences in the experience of anxiety and temporal blood flow to peripheral areas of the body. factors. For example, Bloom and Trautt (1977) Finger pulse volume has been shown to be a found that initially, participants were more valid physiological measure in social-evaluation anxious according to the finger pulse volume situations. measure. However, according to heart rate, participants were more anxious later on. This Electrodermal Activity provides support for the view that any measure of anxiety should be used along with other Another physiological measure of anxiety is measures of anxiety. Various psychological, sweat gland activity. The eccrine sweat glands behavioural, and physiological processes are

[8.8.2002–12:29pm] [1–128] [Page No. 38] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Anxiety Assessment 39

involved in the experience of anxiety and there Acknowledgments are individual differences. This chapter was supported, in part, by Grant No. 410-94-1473 from the Social Sciences and Humanities Research Council of Canada (SSHRC) SUMMARY, CONCLUSIONS, AND to the first author and a SSHRC doctoral fellow- DIRECTIONS FOR FUTURE ship to the second author. RESEARCH References 1 Most of the research has used self-report measures. Additional research can further Beck, A.T., Epstein, N., Brown, G. & Steer, R.A. investigate the reliability and validity of the (1988). An inventory for measuring clinical anxiety: various techniques in the assessment of psychometric properties. Journal of Consulting and anxiety. Clinical Psychology, 56, 893–897. 2 Considerably more research has been Bloom, L.J. & Trautt, G.M. (1977). Finger pulse conducted on the assessment of anxiety in volume as a measure of anxiety: further evaluation. Psychophysiology, 14, 541–544. the social evaluation situational domain Cacioppo, J.T. & Petty, R.E. (1981). Social psycholo- (e.g. presentation situations, interaction gical procedures for cognitive response assessment: situations) compared to other areas. This is the thought-listing technique. In Merluzzi, T.V., especially the case with respect to the use of Glass, C.R. & Genest, M. (Eds.), Cognitive behavioural observation, cognitive Assessment (pp. 309–342). New York: Guilford. Clements, K. & Turpin, G. (1996). Physiological effects measures, and physiological measures. of public speaking assessed using a measure of palmar 3 Future research can focus on the use of sweating. Journal of Psychophysiology, 10, 283–290. these techniques for the assessment of Endler, N.S. (1997). Stress, anxiety and coping: the anxiety in situations other than social multidimensional interaction model. Canadian Psy- chology, 38, 136–153. evaluation situations (i.e. physical danger Endler, N.S., Edwards, J.M. & Vitelli, R. (1991). Endler situations, self-disclosure situations, separa- Multidimensional Anxiety Scales (EMAS): Manual. tion situations, and ambiguous situations). Los Angeles, CA: Western Psychological Services. 4 There are various techniques to assess state Endler, N.S. & Flett, G.L. (2001). Endler Multidimen- anxiety, the momentary experience of sional Anxiety Scales – Social Anxiety Scales: Manual. Los Angeles, CA: Western Psychological Services. anxiety. Included among these are self- Endler, N.S. & Kocovski, N.L. (in press). State and report instruments, behavioural observation trait anxiety revisited. Journal of Anxiety Disorders. methods, cognitive assessment techniques Glass, C.R. & Arnkoff, D.B. (1989). Behavioural and physiological measures. assessment of social anxiety and social phobia. 5 Trait anxiety, the predisposition to be Clinical Psychology Review, 9, 75–90. Gray, J.A. & McNaughton, N. (1996). The neuropsy- anxious in different situations, is assessed chology of anxiety: Reprise. In Hope, D.A. (Ed.), through self-report instruments. Nebraska Symposium on Motivation, 1995: Perspec- 6 The reliability and validity of some techni- tives on Anxiety, Panic, and Fear. Current Theory ques have been demonstrated to be higher and Research in Motivation, Vol. 43 (pp. 61–134). than for other techniques. Lincoln, NE: University of Nebraska Press. Kantor, L., Endler, N.S., Heslegrave R.J. & Kocovski, 7 There are individual differences in the N.L. (2000).Validating Self-Report Measures of qualitative experience of anxiety. It is there- State and Trait Anxiety with a Physiological fore important to use diverse sets of assess- Measure. Manuscript submitted for publication. ment techniques that tap at the various Leary, M.R. (1986). Affective and behavioural compo- nents of shyness: implications for theory, measure- facets of anxiety. ment, and research. In Jones, W.H., Cheek, J.M. & 8 Self-report measures may be the most Briggs, S.R. (Eds.), Perspectives on Shyness: Research convenient method of anxiety assessment and Treatment (pp. 27–38). New York: Plenum. in terms of the time required for adminis- Lewis, A. (1970). The ambiguous word ‘‘anxiety’’. tration, the cost of administration, and data International Journal of Psychiatry, 9, 62–79. Reiss, S., Peterson, R.A., Gursky, D.M. & McNally, analyses. However, other factors (i.e. the R.J. (1986). Anxiety sensitivity, anxiety frequency, validity of the assessment) are also impor- and the prediction of fearfulness. Behaviour tant to consider. Research and Therapy, 24, 1–8.

[8.8.2002–12:29pm] [1–128] [Page No. 39] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 40 Anxiety Disorders Assessment

Spielberger, C.D. (1983). Manual for the State-Trait Related Entries Anxiety Inventory (Form V). Palo Alto, CA: Consulting Press. Taylor, J.A. (1953). A personality scale of manifest Personality (General), Emotions, Anxiety Dis- anxiety. Journal of Abnormal and Social Psychol- orders, Test Anxiety ogy, 48, 285–290.

Norman S. Endler and Nancy L. Kocovski

ANXIETY DISORDERS A ASSESSMENT

INTRODUCTION perception of a threatening or dangerous situation, while trait anxiety is defined as a relative stable The Multidimensional Nature of tendency to interpret situations as threatening or Anxiety dangerous, and to react to them with anxiety. Recent works by Endler and his co-workers Anxiety is one of the most common and universal propose a multidimensional nature for trait emotions. This emotional reaction to the percep- anxiety, highlighting the existence of different tion of threatening or dangerous stimuli occurs facets (social evaluation, physical danger, etc.) throughout an individual’s lifetime. In fact, closely related to specific situational areas. anxiety elicited by stimuli or situations such as With the aim of integrating the above-mentioned animals, physical danger and separation is an aspects, anxiety must be considered as an early biological acquisition, whose function is to emotional response, or pattern of responses, that protect the child from potential dangers. In this includes unpleasant cognitive aspects, physiologi- sense, anxiety is undoubtedly of value in relation cal aspects characterized by high arousal of the to the preservation of the human being. Autonomous Nervous System, and inaccurate and The conceptualization of anxiety has varied less adaptive motor or behavioural reactions. The considerably over recent decades. On the one hand, anxiety response may be provoked both by critics of the unidimensional view of anxiety have situational external and internal stimuli such as proposed a new multidimensional approach. From thoughts, ideas, images, etc., perceived by the indi- this perspective, anxiety is a combination of vidual as threatening or dangerous. Such anxiety- responses, including cognitive, physiological and eliciting stimuli (external or internal) will be mainly behavioural (motor) reactions. These responses are determined by the subject’s characteristics; thus, provoked by identifiable cognitive-subjective, phy- there are remarkable individual differences in rela- siological or environmental stimuli. In spite of the tion to the tendency to manifest anxiety reactions in lack of an accurate explanation of the contents of different situations (Miguel-Tobal, 1990). each system, and there being some discrepancies among authors on what might be understood by the responses of the cognitive system or, to a lesser ANXIETY AS DISORDER extent, those of the physiological system (Cone & Hawkins, 1977; Ferna´ndez-Ballesteros, 1983), this Up to now, we have considered anxiety as a classification of the different anxiety responses in normal emotional response of an individual to three systems is widely accepted and used. different situations or circumstances. However, In addition, since the seminal works of Cattell or when its frequency, intensity and duration are Spielberger in the 1960s, the differentiation excessive, producing serious limitations in differ- between state and trait anxiety has become a ent facets of individuals’ lives and reducing their classic one. State anxiety is conceptualized as a ability to adapt to the environment, we must talk transitory emotional reaction to the individual’s about pathological anxiety.

[8.8.2002–12:29pm] [1–128] [Page No. 40] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Anxiety Disorders Assessment 41

Anxiety is closely related to anxiety disorders, Table 1. DSM-IV–DSM-IV-TR Classification depression, disorders traditionally labelled as Codes Anxiety Disorders neurotic, many psychotic disorders, and a wide variety of psychophysiological problems such as 300.01 Panic disorder without agoraphobia cardiovascular disorders, peptic ulcers, head- 300.21 Panic disorder with agoraphobia 300.22 Agoraphobia without history of aches, premenstrual syndrome, asthma, skin panic disorder disorders, and so on. It is also involved in 300.29 Specific phobia sexual disorders, addictive behaviour and eating 300.23 Social phobia (Social Anxiety disorders; more recently, there are findings that Disorder, added in DSM-IV-TR) relate anxiety to weakness of the immune system. 300.3 Obsessive-compulsive disorder Due to the wide variety of problems in which 309.81 Post-traumatic stress disorder this emotion plays an important role, anxiety 308.3 Acute stress disorder 300.02 Generalized anxiety disorder must be considered a central aspect of psycho- (includes overanxious disorder of pathology and health psychology. In fact, childhood in DSM-IV-TR) thousands of persons with anxiety problems 293.84 Anxiety due to a medical condition seek attention in hospitals, health centres, etc., Variable Substance-induced anxiety disorder and this results in an important economic cost to 300.00 Anxiety disorder not otherwise specified public health services.

Anxiety Disorders 1976). Finally, the discovery of individual differences in relation to the tendency to experience Anxiety disorders constitute the most common anxiety in some situations, but not in others, psychopathology, followed by affective disorders led to theoretical advances that have not yet and drugs and alcohol consumption. The life- been sufficiently applied in research on anxiety prevalence rate accounts for 19.5% of females and disorders. 8% of males (Robins, Helzer & Weissman, 1984). With the aim of including all of these theoretical The classifications of anxiety disorders have advances in an assessment instrument, we devel- varied over recent years. The most widely used are oped the Inventory of Situations and Responses of the ICD-10 (World Health Organization, 1992), Anxiety (ISRA, Miguel-Tobal & Cano Vindel, the DSM-IV (American Psychiatric Association, 1986, 1988 & 1994). The ISRA is a self-report 1994) and the DSM-IV-TR (American Psychiatric instrument for a multidimensional and interactive Association, 2000). The DSM-IV and DSM-IV-TR assessment of anxiety that permits the evaluation of will be used as reference sources, and are shown in the three response systems (cognitive, physiological Table 1. and motor responses), trait anxiety, and four situational areas or specific traits (test anxiety, Anxiety Disorders Assessment interpersonal anxiety, phobic anxiety and daily life anxiety). Changes in the theoretical frameworks of anxiety Several studies have explored differential research that occurred in the late 1960s have not anxiety characteristics, in both anxiety disorders been accurately reflected in assessment proce- and psychophysiological disorders, through the dures which are instruments, especially for self- ISRA. Such studies indicate that there are report measures, the most widely used. This has characteristic profiles in different pathologies impeded the consolidation of a systematic that can be relevant in both the research and research line focused on different aspects of clinical practice contexts (see Miguel-Tobal & anxiety in several anxiety disorders. Cano Vindel, 1995). The works of Lacey (1967) and Lang (1968) proposed the multidimensional nature of anxiety responses and the existence of three relatively independent response systems (cognitive, physio- INSTRUMENTS AND PROCEDURES logical, and motor responses), while the interactive model (Endler, 1973) stressed the multidimension- A large number of procedures and instruments ality of trait anxiety (Endler & Magnusson, 1974, have been used for the assessment of anxiety,

[8.8.2002–12:29pm] [1–128] [Page No. 41] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 42 Anxiety Disorders Assessment

including self-reports, physiological procedures Specific Phobia Assessment and behavioural methods. More information on this issue can be found in Endler and The most frequently used instruments are self- Kocovski’s entry ‘Anxiety Assessment’ in this reports, such as the Fear Survey Schedule I (Lang same volume. Here we shall focus especially on & Lazovik, 1963) and Fear Survey Schedule III the instruments developed for the assessment of (Wolpe & Lang, 1964), for measuring the type different anxiety disorders. It should be noted and intensity of irrational fears and fear-eliciting that procedures for the assessment of general stimuli. Also used are behavioural avoidance anxiety are also commonly used in clinical measures, such as the Behavioural Avoidance practice. Test (Lang & Lazovik, 1963) and the Behavioural Avoidance Slide Test (Burchardt & Levis, 1977). It should be noted that some of Broad Screening these instruments are also used for the assessment of social phobia and agoraphobia. Several structured interviews have been used in order to determine the onset of an anxiety disorder or to make a more accurate diagnosis. Social Phobia Assessment Two good examples are the Anxiety Disorder The Social Avoidance and Distress Scale (SADS), Interview Schedule – Revised (Di Nardo et al., the Fear of Negative Evaluation Scale 1985), and the Structured Clinical Interview for (FNE, Watson & Friend, 1969), the Suinn Test DSM-IV Axis I disorders (Spitzer, Gibbon & Anxiety Behaviour Scale (STABS, Suinn, 1969) and Williams, 1996). the Social Reaction Inventory Revised With regard to specific disorders, some widely (SRI-R, Curran, Corriveau, Monti & Hagerman, used instruments and procedures are: 1980) are used for assessing social skills, while the Social Phobia and Anxiety Inventory (SPAI, Panic Disorder Assessment Turner, Beidel, Dancu & Stanley, 1989) is also employed. Among behavioural measures, the The most widely used self-report instrument for Social Interaction Test (SIT, Trower, Bryant & the assessment of panic attacks is the Panic Argyle, 1978) is designed for measuring social skills Attack Questionnaire (PAQ, Norton, 1988). in a test anxiety-provoking situation by means of role-play procedures. Agoraphobia Assessment In the assessment of agoraphobia, both self-reports Obsessive-Compulsive Disorder and behavioural measures have been used. Among Assessment self-reports, the Agoraphobic Cognitions Questionnaire (ACQ), along with its companion The most important self-report measures used are measure, the Body Sensations Questionnaire the Leyton Obsessional Inventory (LOI, Cooper, (BSQ), were devised to assess ‘fear of fear’ 1970), the Compulsive Activity Checklist (CAC, (Chambless, Caputo, Bright & Gallagher, 1984). Philpott, 1975) and the Maudsley Obsessional- Among behavioural measures, there are two kinds Compulsive Inventory (MOCI, Hodgson & of devices: one type that measures avoidance Rachman, 1977). behaviours, an example of which is the Individualized Behavioural Avoidance Test Post-Traumatic Stress Disorder (IBAT, Agras, Leitenberg & Barlow, 1968), and Assessment another type for measuring the time and distance walked away from a ‘safe’ place as a cue for the There are several methods for the assessment of intensity of agoraphobic reactions (see PTSD disorder, including clinical interviews, self- Emmelkamp, 1982). It should be noted that report instruments and psychophysiological assessment instruments designed for phobia, measures. For the purpose of this entry we social phobia, and panic attacks are also used in consider general-oriented instruments rather than the evaluation of agoraphobia. special populations-oriented ones (combat

[8.8.2002–12:29pm] [1–128] [Page No. 42] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Anxiety Disorders Assessment 43

survivors, rape victims, etc.), except for psycho- out of every three patients also present another physiological measures. Two good examples of anxiety disorder. clinical interviews are the Clinical-Administered PTSD Scale (CAPS-1, Blake, Weathers, Nagy, Kaloupek, Klauminzer, Charney & Keane, 1990), SUMMARY, CONCLUDING and the PTSD Symptom Scale Interview (PSS-I, REMARKS, AND DIRECTIONS Foa, Riggs, Dancu & Rothbaum, 1993). Two FOR FUTURE RESEARCH other good examples of self-report instruments are the Revised Impact of Events Scale (RIES, Anxiety disorder assessment has mainly been Horowitz, Wilner & Alvarez, 1979), and the carried out using self-reports, and to a lesser PTSD Diagnostic Scale (PDS, Foa, 1995). Finally, extent behavioural measures. Physiological mea- data from laboratory studies provide evidence sures do not provide sufficient specificity to that psychophysiological measurement is a delimit or evaluate specific disorders; however, valuable tool in the assessment of PTSD. Studies there is a promising line of research in relation to with combat populations reveal that cardiovas- PTSD. cular measures (heart rate and blood pressure) In addition to this lack of specificity with have generally shown good specificity and regard to anxiety disorders, due to the over- sensitivity in PTSD classification (see Lating & lapping of their symptoms, it is also important to Everly, 1995; Miguel-Tobal, Gonza´lez Ordi & consider the problem of their high comorbidity Le´pez Ortega, 2000). (68% for anxiety disorders and 50% for depres- sion). Taking these aspects into account, it is necessary to carry out a wide-spectrum assess- Generalized Anxiety Disorder (GAD) ment that includes general anxiety measures, Assessment specific disorder measures and measures of depression. Given the lack of specificity of GAD general anxiety Theoretical advances in the study of anxiety assessment instruments, including the State-Trait and research on measurement procedures have Anxiety Inventory (STAI, Spielberger, Gorsuch & fostered the multisystem-multimethod assess- Lushene, 1970), the Beck Anxiety Inventory (BAI, ment, but such advances have been weakly Beck, Epstein, Brown, & Steer, 1988), the Anxiety reflected in anxiety disorder assessment research, Sensitivity Index (ASI, Reiss, Peterson, Gursky & and have had even less impact on clinical McNally, 1986), the Endler Multidimensional practice. This is one of the challenges for the Anxiety Scales (EMAS, Endler, Edwards & future, which it is to hoped will see the Vitelli, 1991) and, in Spain, the Inventory of development of new multidimensional instru- Situations and Responses of Anxiety (ISRA, ments through the integration of data derived Miguel-Tobal & Cano Vindel, 1986, 1988, 1994) from self-reports, physiological records and have been used for its evaluation. behavioural measures. As can be seen, there are very few references to physiological measures in this review since, though commonly used in clinical research, they References have not generally shown enough specificity to discriminate between different anxiety disorders, Agras, W.S., Leitenberg, H. & Barlow, D.H. (1968). except, as mentioned earlier, in the case of Social reinforcement in the modification of agorapho- PTSD. bia. Archives of General Psychiatry, 19, 423–427. Finally, we should stress the appropriateness American Psychiatric Association (1994). Diagnostic and Statistical Manual of Mental Disorders (DSM- of using multiple instruments that allow the IV) (4th ed. ). Washington, DC: APA. assessment of general anxiety on the one hand American Psychiatric Association (2000). Desk Refer- and the evaluation of a specific disorder or dis- ence to the Diagnostic Criteria from DSM-IV-TR. orders on the other. Clinical practice reveals that Washington, DC: APA. Beck, A.T., Epstein, N., Brown, G. & Steer, R.A. it is hard to find a pure disorder, since, as (1988). An inventory for measuring clinical anxiety: Wittchen (1987) points out, the comorbidity rate psychometric properties. Journal of Consulting and for anxiety disorders is 68%: in other words, two Clinical Psychology, 56, 893–897.

[8.8.2002–12:29pm] [1–128] [Page No. 43] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 44 Anxiety Disorders Assessment

Blake, D.D., Weathers, F.W., Nagy, L.M., Appley, M.H. & Trumbull, R. (Eds.), Psychological Kaloupek, D.G., Klauminzer, G., Charney, D. stress: Issues in Research (pp. 14–42). New York: & Keane, T.M. (1990). A clinical rating scale Appleton-Century-Crofts. for assessing current and lifetime PTSD: the CAPS- Lang, P.J. (1968). Fear reduction and fear behaviour: 1. Behaviour Therapist, 18, 187–188. problems in treating a construct. In Shilen, J.M. Burchardt, C.J. & Levis, D.J. (1977). The utility of (Ed.), Research in ; Vol. III presenting slides of a phobic stimulus in the context (pp. 90–103). Washington, DC: American Psycho- of a behavioural avoidance procedure. Behaviour logical Association. Therapy, 8, 340–346. Lang, P.J. & Lazovic, A.D. (1963). Experimental Chambless, D.L., Caputo, G.C., Bright, P. & desensitization of a phobia. Journal of Abnormal Gallagher, R. (1984). Assessment of fear of fear in and Social Psychology, 66, 519–525. agoraphobics: the Body Sensations Questionnaire and Lating, J.M. & Everly, G.S. (1995). Psychophysiologi- the Agoraphobic Cognitions Questionnaire. Journal cal assessment of PTSD. In Everly, G.S. & Lating, of Consulting and Clinical Psychology, 52, J.M. (Eds.), Psychotraumatology: Key Papers and 1090–1097. Core Concepts in Post-Traumatic Stress (pp. Cone, J.D. & Hawkins, R.P. (1977). Behavioural 129–145). New York: Plenum Press. Assessment: New Directions in Clinical Psychology. Miguel-Tobal, J.J. (1990). La ansiedad. In Mayor J. & New York: Brunner-Mazel. Pinillos J.L. (Eds.), Tratado de Psicologı´a General. Cooper, J. (1970). The Leyton Obsessional Inventory. Vol. 8: Motivacio´ n y Emocio´ n (pp. 309–344). Psychological Medicine, 1, 48–64. Madrid: Alhambra. Curran, J.P., Corriveau, D.P., Monti, P.M. & Miguel-Tobal, J.J. & Cano Vindel, A.R. (1986). Hagerman, S.B. (1980). Social skill and social Inventario de Situaciones y Respuestas de Ansiedad anxiety. Behaviour Modification, 4, 493–512. (Inventory of Situations and Responses of Anxiety). Di Nardo, P.A., Barlow, D.H., Cerny, J.A., (1988 & 1994, 2nd and 3rd revisions, respectively). Vermilyea, B.B., Vermilyea, J.A., Himadi, W.G. & Madrid: TEA Ediciones Wadell, M.T. (1985). Anxiety Disorders Interview Miguel-Tobal, J.J. & Cano Vindel, A. (1995). Perfiles Schedule-Revised (ADIS-R). Albany, NY: Center for diferenciales de los trastornos de ansiedad. Ansiedad Stress and Anxiety Disorders. y Estre´ s, 1, 37–60. Emmelkamp, P.M.G. (1982). Phobic and Obsessive- Miguel-Tobal, J.J., Gonzo´ lez Ordi, H. & La´pez Ortega, Compulsive Disorders: Theory, Research and Prac- E. (2000). Estre´s postrauma´tico: hacia una integra- tice. New York: Plenum Press. cio´ n de aspectos psicolo´ gicos y neurobiolo´ gicos. Endler, N.S. (1973). The person versus the situation a Ansiedad y Estre´s, 6, 255–280. pseudo issue? a response to others. Journal of Norton, G.R. (1988). Panic Attack Questionnaire. In Personality, 41, 287–303. Hersen, M. & Bellack, A.S. (Eds.), Dictionary of Endler, N.S., Edwards, J.M. & Vitelli, R. (1991). Behavioural Assessment Techniques (pp. 332–334). Endler Multidimensional Anxiety Scales (EMAS): New York: Pergamon Press. Manual. Los Angeles, CA: Western Psychological Philpott, R. (1975). Recent advances in the behavioural Services. measurement of obsessional illness: Difficulties Endler, N.S. & Magnusson, D. (1974). Interactionism, common to these and other instruments. Scottish trait psychology, psychodynamics, and situationism. Medical Journal, 20, 33–40. Report from the Psychological Laboratories. Uni- Reiss, S., Peterson, R.A., Gursky, D.M. & McNally, versity of Stockholm, No 418. R.J. (1986). Anxiety sensitivity, anxiety frequency, Endler, N.S. & Magnusson, D. (Eds.) (1976). Inter- and the prediction of fearfulness. Behaviour Re- actional Psychology and Personality. Washington, search and Therapy, 24, 1–8. DC: Hemisphere Publishing Co. Robins, L.N., Helzer, J.E. & Weissman, M.M. (1984). Ferna´ndez-Ballesteros, R. (1983). Psicodiagn O´ stico. Lifetime prevalence of specific psychiatric disorders in Madrid: UNED. three sites. Archives of General Psychiatry, 41, Foa, E.B. (1995). PSD (Posttraumatic Stress Diagnostic 949–958. Scale). Manual. Minneapolis: National Computer Spielberger, C.D., Gorsuch, R.L. & Lushene, R.E. System. (1970). STAI. Manual for the State-Trait Anxiety Foa, E.B., Riggs, D.S., Dancu, C.V. & Rothbaum, B.O. Inventory (Self-Evaluation Questionnaire). Palo (1993). Reliability and validity of a brief instrument Alto, CA: Consulting Psychologists Press. for assessing post-traumatic stress disorder. Journal Spitzer, R.L., Gibbon, M. & Williams, J.B.W. of Trauma Stress, 6, 459–473. (1996). Structured Clinical Interview for DSM-IV Hodgson, R.J. & Rachman, S. (1977). Obsessional- Axis I Disorders. New York: New York State Psy- compulsive complaints. Behaviour Research and chiatric Institute, Biometrics Research Department. Therapy, 15, 389–395. Suinn, R. (1969). The STABS, a measure of test anxiety Horowitz, M.J., Wilner, N. & Alvarez, W. (1979). for behaviour therapy: normative data. Behaviour Impact of Event Scale: A measure of subjective Research and Therapy, 7, 335–339. distress. Psychosomatic Medicine, 41, 207–218. Trower, P., Bryant, B. & Argyle, M. (1978). Social Lacey, J.I. (1967). Somatic responses patterning and Skills and Mental Health. Pittsburgh: University of stress: some revisions of the activation theory. In Pittsburgh Press.

[8.8.2002–12:29pm] [1–128] [Page No. 44] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Behavioural Analysis 45

Turner, S.M., Beidel, D.C., Dancu, C.V. & Stanley, M.A. World Health Organization (1992). International (1989). An empirically derived inventory to measure Classification of Diseases and Related Health social fears and anxiety: The Social Phobia and Anxi- Problems. ICD-10. Geneva: WHO. ety Inventory. Psychological Assessment, 1, 35–40. Watson, D. & Friend, R. (1969). Measurement of Juan Jose´ Miguel-Tobal and social-evaluative anxiety. Journal of Consulting and He´ ctor Gonza´ lez-Ordi Clinical Psychology, 33, 448–457. Wittchen, H.U. (1987). Epidemiology of panic attacks and panic disorder. In Hand, I. & Wittchen, H.U. Related Entries (Eds.), Panic and phobias (1). Empirical Evidence of Theoretical Models and Long-Term Effects of Applied fields: Clinical, Anxiety, Emotions, Test Behavioural Treatments. New York: Springer-Verlag. anxiety Wolpe, J. & Lang, P.J. (1964). A fear survey schedule for use in behaviour therapy. Behaviour Research and Therapy, 2, 27–30.

APPLIED BEHAVIOURAL A ANALYSIS

INTRODUCTION CHARACTERISTICS AND AREAS OF INTEREST Applied behaviour analysis is a science in which procedures derived from the principles of Baer, Wolf, and Risley (1968) list seven defining behaviour are systematically applied to improve characteristics of applied behaviour analysis: socially meaningful behaviour that could be behaviour or stimuli studied are selected because rigorously defined and objectively detected and of their significance to society rather than their measured (Cooper et al., 1987). As pointed out importance to theory (applied). The behaviour by Moore (1999), behaviour analysis has devel- chosen must be the behaviour in need of improve- oped three components, as well as a philosophy ment and it must be measurable (behavioural). It of science: (1) the experimental analysis of requires a demonstration of the events that can be behaviour, the basic science of behaviour, (2) responsible for the occurrence or non-occurrence of applied behaviour analysis, the systematic appli- that behaviour (analytic). The interventions must cation of behavioural technology, and (3) the be completely identified and described (technolo- conceptual analysis of behaviour, the philosophi- gical). The procedure for behaviour change is cal analysis of the subject matter of behaviour described in terms of the relevant principles from analysis. The that guides which they are derived (conceptual systems). The behaviour analysis is called radical behaviourism. behavioural techniques must produce significant Even though, the link between the experimental effects for practical value (effective). The beha- and applied component of behaviour analysis is vioural change must be stable over time, appear not as united as it should be, bridges are being consistently across situations, or spread to built between basic and applied work, such as the untrained responses (generality). work being conducted in the areas of establishing The writings of B. F. Skinner have inspired fluency and building momentum (Mace, 1996). behaviour analysts to develop basic concepts of The impact of bridge studies has been especially reciprocal behaviour–environment interactions. pronounced in functional analysis methodologies Over fifty years of research and application on aberrant behaviour (Wacker, 2000). This have shown the usefulness of these basic article will focus on important aspects of concepts in understanding many forms of functional assessment. behaviour, as well as in guiding effective

[8.8.2002–12:29pm] [1–128] [Page No. 45] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 46 Applied Behavioural Analysis

behaviour-change strategies. The knowledge of quite different from a behaviour analytic view stimulus control (when the presentation of a where a distinction is made between what people stimulus changes some measures of behaviour) say they do and what they do (Skinner, 1953), and reinforcement (the process by which the and the focus is on behaviour for its own sake. frequency of an operant (class of responses) is increased) has been useful in the analysis and Function versus Structure treatment of human behaviour problems, as well as creating novel behaviour since the inception Behaviour could be classified either structurally of applied behaviour analysis. Applied behaviour or functionally. When we talk about a structural analysis has played a prominent role in the approach, behaviour is classified or analysed in treatment of individuals with autism and/or terms of its form. For example within develop- developmental disabilities. Though, the areas of mental psychology, the structural approach is a interest have been expanding, e.g. school prominent approach in which researchers inves- settings, treatment of habit disorders, paediatrics, tigate what children do at specific stages of troubled adolescent runaways, brain-injury reha- development, e.g. the behaviour is studied to bilitation, behavioural psychotherapy, organi- draw inferences about cognitive abilities and so- zational management, performance analysis, called hypothetical structures, as object perma- consultation, , college teaching, nence or Piagetian schemes. In behaviour and behavioural medicine (e.g. Austin and Carr, analysis, the topography or structure of a 2000). response is determined by the contingencies of this behaviour. Instead of inferring such cognitive abilities, the researchers consider the history of ASSESSMENT reinforcement to be responsible for the child’s capability (Pierce & Epling, 1999). Structural The role of assessment in applied behaviour approaches to assessment are exemplified by analysis has been described as the process of diagnostic, personality and psychodynamic identifying a problem and identifying how to approaches to human behaviour, while functional alter it for the better. Furthermore, it involves explanations focus on the relationships between selecting and defining the behaviour (target what happens to the organism (i.e. stimuli) and behaviour) to be changed. Two questions have the behaviour of the organism (responses) been essential in behavioural assessment: ‘‘(a) (Sturmey, 1996). The controversy between What types of assessment methodologies provide functional and structural approach is quite reliable and valid data about behavioural func- similar to debate in biology on the separation tion, and how can they be adapted for use in a of physiology and anatomy, and also to Skinner’s particular situation? and (b) How might the treatment of verbal behaviour (function; without results of such assessments improve the design regard to modality (vocal, gestures etc.), the field and selection of treatment procedures?’’ (Neef & of verbal behaviour is concerned with the Iwata, 1994: 211). As we shall examine further, behaviour of individuals and the functional behaviour is assumed to be a function of current units of their verbal behaviour function) versus environmental conditions – antecedent and language (structure; the consistencies of vocabu- consequent stimuli, and it is predicted to be lary and grammar) (Catania, 1998). stable as long as the specific environmental conditions remain stable. On the contrary, Functional Assessment traditional approaches or non-behavioural thera- pies assume that the behaviour is a function of Early in the development of behaviour analysis, enduring, underlying mental states or personal Skinner (1938) argued that behaviour did not variables. One premise is that the client’s verbal take place in a vacuum and a response must have behaviour (what people talk about, what they do a function. Empirical demonstrations of ‘‘cause– and why they do it) is considered important effect relationships’’ between environment and because it is believed to be reflective of a person’s behaviour have been rendered possible inner state and the mental processes that govern a by functional analysis (Skinner, 1953). Since person’s behaviour (Cooper et al., 1987). This is then comprehensive methods to systematically

[8.8.2002–12:29pm] [1–128] [Page No. 46] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Behavioural Analysis 47

assess particular functions of different types of behaviour problem may be used in establishing and behaviour have been developed, and functional strengthening alternative behaviours, and (4) those assessment is one of the most intense research reinforcers and/or treatment components that are areas in our field (see for example Iwata et al., relevant (Iwata et al., 2000). 2000; Repp & Horner, 1999). Results from the research on functional analysis Functional assessment is an umbrella term and methodologies have shown that functional analyses encompasses: (1) indirect assessments, which are are effective in identifying environmental determi- characterized by interviews and questionnaires nants of self-injurious behaviour (SIB), and subse- and behavioural functions. They are based on quently, in guiding the process of treatment subjective verbal reports in absence of direct selection (Iwata et al., 1994). Furthermore, results observation. Two recognized indirect methods have shown that the growing use of functional are the Motivation Assessment Scale (Durand & assessment based interventions have increased the Crimmins, 1988) and the Motivation Analysis number of studies using non-aversive procedures Rating Scale (Wieseler et al., 1985), (2) descrip- (Carr et al., 2000). tive assessments involve no manipulation of relevant variables and are based on direct Recording techniques observation, e.g. the antecedent–behaviour–con- sequence assessment (ABC) or scatter plot In applied behaviour analysis it is important to assessment, (3) functional experimental analyses demonstrate that a particular intervention has been or analogue functional assessment involve manip- responsible for a particular behaviour change. ulation of suspected maintaining variables using Therefore, measurement is very important with experimental methodology to demonstrate respect to designing successful interventions and control over responding (Desrochers et al., evaluating treatment changes. Automatic record- 1997). The first two approaches are approxima- ing, permanent products, and direct observational tions compared to the third because they do not recording are procedures used for measuring and elucidate functional relationships, and both are recording behaviour. Direct observational record- characteristically non-experimental. Furthermore, ing include frequency or event, duration, or latency the functional experimental analysis is most recording, and the recording could either be effective in identifying the function of problem continuous, time sampling or interval (Cooper et behaviour (Carr et al., 1999). al., 1987). Objectivity, clarity and completeness have been set forth as three criteria of an adequate response definition (Kazdin, 1982). Experimental functional analysis or analogue functional Experimental designs assessment In experimental functional analyses various Since the prominent publication by Iwata et al. experimental designs have been used to rule out (1982) there has been a remarkable increase of the possibility that changes in extraneous vari- publications concerning experimental functional able(s) other than in the independent variable analysis (See Journal of Applied Behaviour could be responsible for the change in dependent Analysis). Experimental functional analysis repre- variable, e.g. eliminating rival explanations. sents a simulation of the natural environment and Thus, these experimental designs have been used will be the primary tool for demonstrating causal to study the functional relationships between relationships (Carr et al., 1999). Experimental environmental changes and changes in target functional analysis methodologies can be used to behaviour. Typical experimental design N¼1 identify: (1) antecedent conditions (setting events, design (within-subject manipulation, single-case establishing operations and/or discriminative sti- research design) have been used in applied muli) under which behaviour occurs, and these behaviour analysis, and the designs have been conditions may then be altered so that problem categorized as ABAB designs, multiple baseline behaviours are less likely, (2) reinforcement designs, multiple treatment designs and changing contingencies that must be changed, (3) whether criterion designs (Kazdin, 1982). The multi- the same reinforcer that currently maintains the element design (multiple treatment designs) has

[8.8.2002–12:29pm] [1–128] [Page No. 47] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 48 Applied Behavioural Analysis

typically been used in experimental functional disabilities. In Austin, J. & Carr, J.E. (Eds.), analysis (e.g. Iwata et al., 1982). Handbook of Applied Behaviour Analysis (pp. 91– 112). Reno, Nevada: Context Press. In single-case research, replication, either direct Catania, A.C. (1998). Learning (4th ed.). Englewood or systematic, is crucial for evaluating generality Cliffs, New Jersey: Prentice-Hall. of intervention effects across subjects. The term Cooper, J.O., Heron, T.E. & Heward, W.L. (1987). direct replication has been used when the same Applied Behaviour Analysis. Merrill Publications: procedures have been used across a number of Columbus. Desrochers, M.N., Hile, M.G. & Williams-Moseley, different subjects, while systematic replication T.L. (1997). Survey of functional assessment indicate that features (e.g. types of subjects, procedures used with individuals who display intervention, target behaviour) of the original mental retardation and severe problem behaviours. experiment vary. By replicating in this way, American Journal on Mental Retardation, 5, knowledge will be accumulated, and behaviour- 535–546. Durand, M. & Crimmins, D.B. (1988). Identifying the ists will be pyramid builders. variables maintaining self-injurious behaviour in a psychotic child. Journal of Autism and Develop- mental Disorders, 18, 99–117. CONCLUSION AND FUTURE Iwata, B.A., Dorsey, M.F., Slifer, K.J., Bauman, K.E. & PERSPECTIVES Richman, G.S. (1982). Toward a functional analysis of self-injury. Analysis and Intervention in Develop- mental Disabilities, 2, 3–20. Different aspects regarding behavioural assess- Iwata, B.A., Kahng, S.W., Wallace, M.D. & Lindberg, ment as indirect assessment, descriptive assess- J.S. (2000). The functional analysis model of ment and experimental functional analysis have behavioural assessment. In Austin, J. & Carr, J.E. been discussed. Extension and refinement of (Eds.), Handbook of Applied Behaviour Analysis behavioural assessment and functional analysis (pp. 61–89). Reno, Nevada: Context Press. Iwata, B.A., Pace, G.M., Dorsey, M.F., Zarcone, J.R., technologies will, hopefully, provide for even Vollmer, T.R., Smith, R.G., Rodgers, T.A., Lerman, more effective methods in establishing behaviour D.C., Shore, B.A., Mazelski, J.L., Goh, H.-L., and treating maladaptive behaviour. In addition, Cowdery, G.E., Kalsher, M.J., McCosh, K.C. & the advancement of computer technology allows Willis, K.D. (1994). The functions of self-injurious behaviour: an experimental-epidemiological analysis. for more simplified assessment techniques. Until Journal of Applied Behaviour Analysis, 27, now functional assessment technologies have 215–240. primarily focused on non-compliance and self- Kazdin, A.E. (1982). Singe-Case Research Designs. injurious and aggressive behaviour in persons New York: Oxford University Press. with disabilities and autism, but advancements in Mace, F.C. (1996). In pursuit of general behavioural relations. Journal of Applied Behaviour Analysis, these procedures will include their applications 29, 557–563. on other types of behaviour and a larger diversity Moore, J. (1999). The basic principles of behaviourism. of problem behaviour in populations other than In Thyer, B.A. (Ed.), The Philosophical Legacy of persons with autism and disabilities. Behaviourism (pp. 41–68). Dordrecht: Kluwer Academic Publishers. Neef, N.A. & Iwata, B.A. (1994). Current research on functional analysis methodologies: an introduction. References Journal of Applied Behaviour Analysis, 27, 211–214. Pierce, W.D. & Epling, W.F. (1999). Behaviour Austin, J. & Carr, J.E. (Eds.) (2000). Handbook of Analysis and Learning (2nd ed.). Englewood Cliffs, Applied Behaviour Analysis. Reno, Nevada: Context NJ: Prentice Hall, Inc. Press. Repp, A.C. & Horner, R.H. (Eds.) (1999). Functional Baer, D.M., Wolf, M.M. & Risley, T.R. (1968). Some Analysis of Problem Behaviour. Belmont, CA: current dimensions of applied behaviour analysis. Wadsworth Publishing Company. Journal of Applied Behaviour Analysis, 1, 91–97. Skinner, B.F. (1938). The Behaviour of Organisms. Carr, E.G., Langdon, N.A. & Yarbrough, S.C. (1999) Acton, Massachusetts: Copley Publishing Group. Hypothesis-based intervention for severe problem Skinner, B.F. (1953). Science and Human Behaviour. behaviour. In Repp, A.C. & Horner, R.H. (Eds.), New York: Free Press. Functional Analysis of Problem Behaviour (pp. 9–31). Sturmey, P. (1996). Functional Analysis in Clinical Belmont, CA: Wadsworth Publishing Company. Psychology. Baffins Lane, Chichester, UK: John Wiley Carr, J.E., Coriaty, S. & Dozier, C.L. (2000). Current & Sons. issues in the function-based treatment of aberrant Wacker, D.P. (2000). Building a bridge between research behaviour in individuals with developmental in experimental and applied behaviour analysis. In

[8.8.2002–12:29pm] [1–128] [Page No. 48] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Clinical 49

Leslie, J.C. & Blackman, D.E. (Eds.), Experimental Related Entries and Applied Analysis of Human Behaviour (pp. 205–234). Reno, Nevada: Context Press. Wieseler, N.A., Hanzel, T.E., Chamberlain, T.P. & Behavioural Techniques, Observational Techni- Thompson, T. (1985). Functional taxonomy of ques (General), Observational Techniques in stereotypic and self-injurious behaviour. Mental Clinical Setting, Theoretical Perspective: Beha- Retardation, 23, 230–234. vioural

Erik Arntzen

A APPLIED FIELDS: CLINICAL

INTRODUCTION whether a patient needs treatment and is likely to benefit from it. Accurate differential diagnosis Psychological assessment is utilized in clinical identifies pathological conditions (e.g. depres- psychology primarily for purposes of differential sion, paranoia) and maladaptive characteristics diagnosis, treatment planning, and outcome (e.g. passivity, low self-esteem) for which evaluation. Differential diagnosis involves draw- treatment is usually indicated, and adequate ing on assessment information to describe an psychological evaluation helps to distinguish individual’s psychological characteristics and such conditions and characteristics from normal adaptive strengths and weaknesses. These range functioning that does not call for descriptions provide a basis for determining (a) professional mental health intervention. Assess- what type of disorder an individual may have, (b) ment methods also provide valuable information the severity and chronicity of this disorder and concerning two factors known to predict the circumstances in which it is likely to be whether people are likely to become involved manifest, and (c) the kinds of treatment that are in and profit from psychotherapy: their motiva- likely to provide the individual relief from this tion for treatment and their accessibility to being disorder. With respect to further treatment treated. planning, adequate assessment information helps Motivation for treatment usually corresponds to to guide treatment strategies and anticipate possi- the amount of subjectively felt distress that people ble obstacles to progress in therapy. As for are experiencing. Accessibility to psychological outcome evaluation, pre-treatment assessments treatment typically depends on how willing establish an objective baseline against which people are to examine themselves, to express their treatment progress can be monitored in subse- thoughts and feelings openly, and to make quent evaluations, and by which the eventual changes in their customary beliefs and preferred benefits of the treatment can be judged at its ways of conducting their lives. Information derived conclusion. These clinical contributions of psy- from appropriate assessment procedures can chological assessment can be implemented during provide clinicians with objective indices of each of each of four sequential phases in delivering these variables, and these assessment data can in psychological treatment: deciding on therapy, turn be used as a basis for determining whether to planning therapy, conducting therapy, and recommend and proceed with some form of evaluating therapy. treatment.

DECIDING ON THERAPY PLANNING THERAPY

The first step in the clinical utilization of Planning therapy for patients who need and assessment information consists of deciding want to receive psychological treatment involves

[8.8.2002–12:29pm] [1–128] [Page No. 49] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 50 Applied Fields: Clinical

(a) deciding on the appropriate setting in which kinds of conditions and difficulties, especially in to deliver the treatment, (b) estimating the people who are problem-oriented, respond duration of the treatment, and (c) selecting the relatively well to cognitive-behavioural forms of particular type of treatment to be given. With treatment, whereas other kinds of disorders and respect to deciding on the treatment setting, maladaptive tendencies, especially in people who assessment data provide reliable information are interpersonally-oriented, respond better to concerning the severity of a patient’s disturbance, psychodynamic-interpersonal than cognitive- the patient’s ability to distinguish reality from behavioural therapy. fantasy, and his or her likelihood of becoming Psychological mindedness and preferences for suicidal or dangerous to others, all of which bear problem-oriented or interpersonally-oriented app- on whether the person requires residential care or roaches to life situations are among a vast array can be treated safely and adequately as an of personality characteristics that can be mea- outpatient. The more severely disturbed people sured with assessment methods. Accordingly, are, the farther out of touch with reality they are, adequately conceived pretherapy psychological and the greater their risk potential for violence, assessment can facilitate treatment planning by the more advisable it becomes to care for them in differentiating among psychological states and a protected environment. orientations of the individual that have known Regarding treatment duration, clinical experi- implications for successful response to particular ence and research findings consistently indicate treatment approaches. that mild and acute problems of recent onset can usually be treated successfully in a shorter period of time than severe and chronic problems of long-standing duration. A variety of psycho- CONDUCTING THERAPY diagnostic measures provide clues to the chronicity as well as the severity of symptomatic Psychological assessment can play a key role in and characterological mental and emotional conducting therapy by helping to identify in problems, and pretreatment data obtained advance: (a) treatment targets on which the with these measures can accordingly help therapy should be focused and (b) possible clinicians formulate some expectation of how obstacles to progress towards these treatment long a treatment is likely to last. Having goals. Appropriately collected assessment data, available such assessment-based information on and particularly the results of a multimethod expected duration in turn assists clinicians in test battery, typically contain many normal presenting treatment recommendations to pro- range findings and often some indications as spective patients. well of notably good personality strengths and As for treatment selection, people who are especially admirable personal qualities. At the relatively psychologically minded, self-aware, same time, especially in people who are being and interested in gaining fuller self-understand- evaluated for symptoms or difficulties that have ing are relatively likely to respond positively to led them to seek professional help, test data are an uncovering, insight-oriented, and conflict- likely to reveal specific adaptive shortcomings focused treatment approach. Patients whose and coping limitations. One person may show a preference is to feel better without having to penchant for circumstantial reasoning and poor examine themselves closely, on the other hand, judgement; another person may give evidence of are more likely to become actively engaged in poor social skills and interpersonal withdrawal; supportive and symptom-focused approaches to a third may exhibit considerable emotional treatment than in exploratory psychotherapy. inhibition with restricted capacity to express Psychologically minded people are inclined to feelings and feel comfortable in emotionally feel dissatisfied with supportive treatment, charged situations. In short, any assessment because it does not get at the root of their findings that fall outside of an established problems, whereas relief-minded people tend to normal range and are known to indicate specific feel uncomfortable in uncovering treatment, types of cognitive dysfunction, affective distress, because it makes unwelcome demands on them. coping deficit, personal dissatisfaction, or inter- Additionally, there is reason to believe that some personal inadequacy in turn assist therapists and

[8.8.2002–12:29pm] [1–128] [Page No. 50] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Clinical 51

their patients in deciding on the objectives of comparison with the results of subsequent their work together and directing their efforts assessments. Periodic re-evaluations can then accordingly. shed light on whether the treatment is making a Some psychological characteristics of patients difference, how close it has come to meeting its that constitute targets in their treatment may aims, in what way the focus of continued also pose obstacles to their becoming effectively treatment should be adjusted, and whether a engaged in therapy and making progress termination point has been reached. toward their goals. For example, people who For example, if a reliable test index shows are set in their ways and characteristically rigid abnormally high anxiety, low self-esteem, poor and inflexible in their views often have self-control, or excessive anger, and a retest difficulty reframing their perspectives or mod- during treatment shows the same or a worse ifying their behaviour in response even to well- result for any of these treatment targets, there is conceived and appropriately implemented treat- objective evidence that no progress has been ment interventions. People who are interperson- made on this front. Such results can then lead to ally aversive or withdrawn may be slow or an informed decision to alter the type or focus reluctant to form the kind of working alliance of the treatment, change the therapist, or await with their therapist that facilitates progress in the next re-assessment before making any most forms of therapy. People who are change. On the other hand, should retesting relatively satisfied with themselves and not show an index closer to an adaptive range than experiencing much subjectively felt distress initially, there is reason to conclude that may have little tolerance for the demands of progress is being made on the treatment target becoming seriously engaged in a course of related to that index but that further improve- psychological treatment. Characteristics of these ment remains to be made in that area. When an kinds do not preclude effective psychotherapy, initially abnormal test result is found on but they can result in slow progress, and they retesting to be in an adaptive range, then may cause patients and therapists to become therapists and their patients can conclude with discouraged and terminate prematurely a treat- confidence that they have achieved the objective ment that does not appear to be going well. to which this result relates and do not need to Pretreatment assessment data serve to alert address it further. At the point when retesting therapists in advance to possible treatment indicates that most or all of the treatment targets obstacles, which can help them understand have reached or are approaching as much and be patient with initially slow progress and resolution as could realistically be expected, also guide them in dealing directly with these then the assessment process helps to indicate obstacles, as by concentrating in the early that an appropriate termination point has been phases of therapy on encouraging flexibility and reached. open-mindedness, building a comfortable and Assessments conducted at the conclusion of trusting treatment relationships, or generating psychotherapy, when compared with initial base- some motivation for the patient’s involvement line evaluations, provide an objective basis for in the therapy. evaluating the overall benefit of the treatment that has been provided. Evaluations of treatment benefit made possible by pre-therapy and post- EVALUATING THERAPY therapy assessments serve important research and practical purposes in clinical psychology. With Psychological assessment provides valuable data respect to research issues, assessment data for monitoring the progress of therapy and bearing on treatment benefit facilitates compar- measuring its eventual benefit. For this potential ison studies of the relative effectiveness of benefit of assessment to be realized, it is vital different types and modalities of therapy. For for assessment data to be collected from practical purposes, retest findings demonstrating patients prior to their beginning treatment. In treatment benefit bear witness to the value of addition to helping to identify treatment targets psychological interventions, particularly as and the long-term objectives of therapy, pre- weighed against the financial cost of these treatment data provide an objective baseline for services.

[8.8.2002–12:29pm] [1–128] [Page No. 51] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 52 Applied Fields: Clinical

WIDELY USED INSTRUMENTS References

Surveys of clinical psychologists and the contents Beutler, L.E. & Harwood, T.M. (1995). How to assess of standard handbooks concerning psycholo- clients in pre-treatment planning. In Butcher, J.N. gical assessment identify several instruments as (Ed.), Clinical Personality Assessment (pp. 59–77). New York: Oxford. being among those most widely used by Blatt, S.L. & Ford, R.Q. (1994). Therapeutic Change. clinicians in the United States for purposes of New York: Plenum. differential diagnosis, treatment planning, and Garfield, S.L. (1994). Research on client variables outcome evaluation. Four of these measures are in psychotherapy. In Bergin, A.E. & Garfield, relatively structured self-reports inventories on S.L. (Eds.), Handbook of Psychotherapy and Behaviour Change (4th ed.; pp. 190–228). New which conclusions are derived from what York: Wiley. respondents are able and willing to say Greencavage, L.M. & Norcross, J.C. (1990). What are about themselves: the Minnesota Multiphasic the commonalities among the therapeutic factors? Personality Inventory, the Millon Clinical Professional Psychology, 21, 372–378. Hayes, S.C., Nelson, R.O. & Jarrett, R.B. (1987). The Multiaxial Inventory, the Sixteen Personality treatment utility of assessment. American Psycholo- Factors Questionnaire, and the Personality gist, 42, 963–974. Assessment Inventory. Four of them are Horvath, O., & Greenberg, L.S. (Eds.) (1994). The relatively unstructured performance-based mea- Working Alliance. New York: Wiley. sures in which the key data consist not of what Hurt, S.W., Reznikoff, M. & Clarkin, J.F. (1991). Psychological Assessment, Psychiatric Diagnosis, respondents say about themselves but how & Treatment Planning. New York: Brunner/ they deal with various kinds of somewhat Mazel. ambiguous tasks that are assigned to them: Kubiszyn, T.W., Meyer, G.J., Finn, S.E., Eyde, L.D. the Rorschach Inkblot Method, the Thematic Kay, G.G., Moreland, K.L., Dies, R.R. & Eisman, Apperception Test, several types of figure drawing E.J. (2000). Empirical support for psychological assessment in clinical health care settings. Profes- tasks, and some alternative sentence completion sional Psychology, 31, 119–130. methods. Maruish, M.E. (Ed.) (1999). The Use of Psychological Testing for Treatment Planning and Outcome Assessment (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. Shectman, F. & Smith, W.H. (Eds.) (1984). Diagnostic CONCLUSIONS AND FUTURE Understanding and Treatment Planning. New York: PERSPECTIVES Wiley. Weiner, I.B. (1983). The future of psychodiagnosis Psychological assessment has been an integral revisited. Journal of Personality Assessment, 47, 451–461. part of clinical psychology since its inception and Weiner, I.B. & Exner, J.E. (1991). Rorschach changes continues to the present day to provide practi- in long-term and short-term psychotherapy. Journal tioners with valuable information to guide their of Personality Assessment, 56, 453–465. evaluation and treatment of persons who seek their help. At times, failure to appreciate the Irving B. Weiner benefits of preceding treatment with thorough assessment has led to insufficient teaching and Related Entries learning of psychodiagnostic methods by clinical psychologists, as has the regrettable and short- Applied Behavioural Analysis, Child and sighted devaluing of diagnostic procedures by Adolescent Assessment in Clinical Settings, health insurance providers. However, the future Clinical Judgement, Couple Assessment in application of psychodiagnostic methods in Clinical Settings, Interview in Behavioural Clinical and Health Settings, Observational clinical psychology appears to rest safely in the Techniques in Clinical Setting, Goal Attain- hands of practitioners and researchers whose ment Scaling, Psychophysiological Equipment who know from their experience and data how and Measurements, Outcome Assessment/ useful assessment can be in facilitating good Outcome Treatment, Prediction: Clinical vs. clinical decisions. Statistical

[8.8.2002–12:29pm] [1–128] [Page No. 52] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Education 53

A APPLIED FIELDS: EDUCATION

INTRODUCTION Paradigm Change: From Testing Towards Assessment The role of assessment and evaluation in Many authors (Mayer, 1992; De Corte, 1990) education has been crucial, probably since the have pointed to the importance of instruction to earliest approaches to formal education. promote student’s abilities as thinkers, problem- However, change in this role has been dramatic solvers and inquirers. Underlying this goal is in the last few decades, largely due to wider the view that meaningful understanding is based developments in society. The most dramatic on the active construction of knowledge and change in our views of assessment is represented often involves shared learning. It is argued that a by the notion of assessment as a tool for learning. new form of education requires reconsideration Whereas in the past, we have seen assessment about assessment (Dochy, Segers & Sluijsmans, only as a means to determine measures and thus 1999). Changing towards new forms of learning, certification, there is now a realization that the with a status quo for evaluation, undermines the potential benefits of assessing are much wider value of innovation. Students do not invest in and impinge on in all stages of the learning learning that will not be honoured. Assessment is process. In this contribution, we will outline some the most determining factor in education for the of the major developments in educational learning behaviour of students. Traditional assessment, and we will reflect on the future of didactic instruction and traditional assessment education within powerful learning environments, of achievement are not suited to the modern where learning, instruction and assessment are educational demands. Such tests were generally more fully integrated. designed to be administered following instruction, rather than to be integrated with learning. As a Consequences of the consequence, due to their static and product- Developments in Society oriented nature, these tests not only lack diagnostic power but also fail to provide relevant Economic and technological change, which information to assist in adapting instruction brings significant changes in the requirements appropriately to the needs of the learner of the labour market, poses increasing demands (Campione & Brown, 1990; Dochy, 1994). on education and training. For many years, the Furthermore, standard test theory characterizes main goal of education has been to make performance in terms of the difficulty level of students knowledgeable within a certain domain. response choice items and focuses primarily on Building a basic knowledge store was the core measuring the amount of declarative knowledge issue. Students taking up positions in modern that students have acquired. organizations nowadays need to be able to This view of performance is at odds with analyse information, to improve their problem- current theories of cognition. Achievement assess- solving skills and communication and to reflect ment must be an integral part of instruction, in on their own role in the learning process. People that they should reflect, shape, and improve increasingly have to be able to acquire knowl- student leaning. Assessment procedures should edge independently and use this body of not only serve as a tool for crediting students organized knowledge in order to solve unfore- with recognized certificates, but should also be seen problems. As a consequence, education used to monitor progress and, if needed to direct should contribute to the education of students as students to remedial learning activities. The view lifelong learners. that the evaluation of student’s achievements is

[8.8.2002–12:29pm] [1–128] [Page No. 53] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 54 Applied Fields: Education

something which happens at the end of the complete and realistic picture of the student’s process of learning is no longer widespread; ability (achievement). It evaluates not only the assessment is now represented as a tool for product, but also the process of learning. learning (Dochy & McDowell, 1997). Students get feedback about their incorrect The changing learning society has generated thinking strategies. the so-called assessment culture as an alternative Within the new forms of ‘new assessment’, to the testing culture. The assessment culture much attention is paid to authentic problem- strongly emphasizes the integration of instruction solving, case-based exams, portfolios and and assessment. Students play far more active the use of co- peer-, and self-assessment roles in the evaluation of their achievement. The (Birenbaum, 1996). construction of tasks, the development of criteria In traditional education, the question ‘who for the evaluation of performance, and the takes up the exam and who defines the criteria’ scoring of the performance may be shared or is seldom asked. Most of the time, it is the negotiated among teachers and students. The teacher. New forms of education do pose this assessment takes all kinds of forms such as question. Students themselves, other students or observations, text- and curriculum-embedded the teacher and students together are responsible questions and tests, interviews, performance for assessment. The type of student self-assess- assessments, writing samples, exhibitions, portfo- ment referred to most frequently in the literature lio assessment, and project and product assess- is a process, which involves teacher-set criteria ments. Several labels have been used to describe and where students themselves carry out the subsets of these alternatives, with the most assessment and marking. Another form of common being ‘direct assessment’, ‘authentic student self-assessment is the case where a assessment’, ‘performance assessment’ and ‘alter- student assesses herself or himself, on the basis native assessment’. of criteria which she or he has selected, the assessment being either for the student’s personal New Methods of Assessment guidance or for communication to the teacher or others. According to Hall (1995) there are two Investigations of new approaches (e.g. critical factors for genuine self-assessment: the Birenbaum & Dochy, 1996; Nitko, 1995; student not only carries out the assessment, but Shavelson et al., 1996) illustrate the development also selects the criteria on which the assessment of more ‘in context’ and ‘authentic’ assessment is based. Similarly, peer-assessment can indicate (Archbald & Newmann, 1992; Hill, 1993). that fellow students both select the criteria and Nisbet (1993) defines the term authentic assess- carry out the assessment. Any situation where ment as ‘methods of assessment which influence the tutor and students share in the selection of teaching and learning positively in ways which criteria and/or the carrying-out of the assessment contribute to realizing educational objectives, is more accurately termed co-assessment (Hall, requiring realistic (or ‘‘authentic’’) tasks to be 1995). However, it is still frequently the case performed and focusing on relevant content and that teachers control the assessment process, skills, essentially similar to the tasks involved in sometimes assisted by professional bodies or the regular learning processes in the classroom’ assessment experts, whereas students’ assess- (p. 35). ments and criteria are taken seriously but Assessment of such ‘authentic’ tasks is highly considered to be additional to the assessment individual and contextualized. The student gets undertaken by the teacher or professor rather feedback about the way he solved the task and than replacing it (Rogers, 1995). Implementing about the quality of the result. Evaluation is forms of self-, peer- and co-assessment may given, on the basis of different ‘performance decrease the time-investment professors would tasks’, performed and (reviewed) assessed at otherwise need to make in more frequent different moments. The evaluation criteria have assessment. In addition to that advantage, to be known in advance. When students know using these assessment forms assists the devel- the criteria and know how to reach them, they opment of certain skills for the students, e.g. will be more motivated and achieve better communication skills, self-evaluation skills, results. This form of evaluation gives a more observation skills, self-criticism.

[8.8.2002–12:29pm] [1–128] [Page No. 54] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Education 55

ASSESSING NEW ASSESSMENT The authors of the 1985 Standards define test FORMS: DEVELOPMENTS IN validity as ‘a unitary concept, requiring multiple EDUMETRICS lines of evidence, to support the appropriateness, meaningfulness, and usefulness of the specific Judgements regarding the cognitive significance inferences made from test scores.’ (AERA, APA, of an assessment begin with an analysis of the NCME, 1985: 9) All validity research should be cognitive requirements of tasks as well as the guided by the principles of scientific inquiry ways in which students try to solve them (Glaser, reflected in construct validity. 1990). Two criteria by which educational and Within the construct validity framework, psychological assessment measures are commonly almost any information, gathered in the process evaluated are validity and reliability. One can say of developing and using an assessment is relevant, that based on these criteria, the results above are when it is evaluated against the theoretical not yet consistent and depending upon the rationale underlying the proposed interpretation assessment form there is a larger or smaller and inferences, made from test scores (Moss, basis to state that the evaluation is acceptable. 1995). Thus, validation embraces all the experi- It is however important to note that Birenbaum mental statistical and philosophical means by (1996) mentions that the meaning of validity and which hypotheses are evaluated. Validity conclu- reliability has recently expanded. Dissatisfaction sions then, are best presented in the form of an with the available criteria, which were originally evaluative argument, which integrates evidence to developed to evaluate indirect measures of justify the proposed interpretation against plau- performance, is attributed to their insensitivity sible alternative interpretations. to the characteristics of a direct assessment of Kane’s argument-based approach is in line with performance. Cronbach’s view on validity. According to Kane The most important element of new assessment (1992), to validate a test-score interpretation is to models is the reflection of the competencies, support the plausibility of the corresponding required in real-life practice. The goal is to ensure interpretative argument with appropriate evidence: that the success criteria of education or training (1) for the inferences and assumptions, made in the processes are the same as those used in the proposed interpretative argument and (2) for practice setting. Hence, as notions of fitness of refuting potential counter arguments. The core purpose change, and as assessment of more issue is not that we must collect data to underpin qualitative areas are developed, the concepts of validity, but that we should formulate transparent, validity and reliability encompassed within the coherent, and plausible arguments to underpin instruments of assessment must also change validity. accordingly. This means that we should widen Authors like Kane and Cronbach use validity up our view and search for other and more principles from interpretative research traditions, appropriate criteria. It should not be surprising instead of psychometric traditions, to assist in that a new learning society and consequently a evaluating less-standardized assessment practices. new instructional approach and a new assessment Other criteria suggested for measuring validity culture cannot be evaluated on the basis of the of new assessment forms are the transparency of pre-era criteria solely. the assessment procedure, the impact of assess- ment on education, directness, effectiveness, fairness, completeness of the domain description, Validity Related Issues practical value and meaningfulness of the tasks for candidates, and authenticity of the tasks Although performance assessment appears to be (Haertel, 1991). According to Messick (1994), a valid form of assessment, in that it resembles these validation criteria are, in a more sophisti- meaningful learning tasks, this measure may be cated form, already part of the unifying concept no more valid than scores derived from response of validity, which he expressed in 1989. He choice items (Linn et al., 1991). Evidence is asserted that validity is an evaluative summary of needed to assure that assessment requires the both evidence for and the actual as well as high-level thought and reasoning processes that potential consequences of score interpretation they were intended to evoke. and use. The more traditional conception of

[8.8.2002–12:29pm] [1–128] [Page No. 55] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 56 Applied Fields: Education

validity as ‘evidence for score interpretation and and McDowell (1997) argue that assessing high- use’ fails to take into account both evidence of order skills by means of authentic assessment will the value implications of score interpretation and lead to the teaching of such high-order knowl- the social consequences of score use. edge and skills. Messick’s unifying concept of validity encom- With today’s emphasis on high-stakes assess- passes six distinguishable parts – content, ment, two threats to test validity are worth substantive, structural, external, generalizability, mentioning: construct under-representation and and consequential aspects of construct validity – construct-irrelevance variance. In the case of that conjointly function as general criteria for all construct-irrelevance variation, the assessment is educational and psychological assessment. The too broad, containing systematic variance that is content aspect of validity means that range and irrelevant to the construct being measured. The type of tasks, used in assessment must be an threat of construct-underrepresentation means appropriate reflection (content relevance, repre- that the assessment is too narrow and fails to sentativeness) of the construct-domain. Increasing include important dimensions of the construct achievement levels in assessment tasks should being measured. reflect increases in expertise of the construct- domain. The substantive aspect emphasizes the consistency between the processes required for Special Points of Attention For New solving the tasks in assessment, and the processes Assessment Forms used by domain-experts in solving tasks (pro- blems). Further, the internal structure of assess- The above implies in our view that other criteria ment – reflected in the criteria, used in assessment suggested for measuring validity of new assess- tasks, the interrelations between these criteria and ment forms will need to be taken into account, the relative weight placed on scoring these i.e. the transparency of the assessment procedure, criteria, should be consistent with the internal the impact of assessment on education, direct- structure of the construct-domain. If the content ness, effectiveness, fairness, completeness of the aspect (relevance, representativeness of content domain description, practical value and mean- and performance standards) and the substantial ingfulness of the tasks for candidates, and aspect of validity is guaranteed, score interpreta- authenticity of the tasks. tion, based on one assessment task should be In addition, predictable difficulties will have to generalizable to other tasks, assessing the same be taken into account, such as those outlined in construct. The external aspect of validity refers to the following paragraphs. the extent that the assessment scores’ relationship Authentic assessment tasks are more sensitive with other measures and non-assessment beha- to construct-underrepresentation and construct- viours reflect the expected high, low, and irrelevance variation, because they are often interactive relations. The consequential aspect of loosely structured, so that it is not always clear validity includes evidence and rationales for to which construct-domain inferences are drawn. evaluating the intended and unintended conse- Birenbaum (1996) argues that it is important to quences of score interpretation and use (Messick, specify accurately the domain and to design the 1994). assessment rubrics, so they clearly cover the In line with Messick’s conceptualization of construct-domain. Messick (1994) advises to consequential validity, Frederiksen and Collins adopt a construct-driven approach to the selec- (1989) proposed that assessment has ‘systematic tion of relevant tasks and the development of validity’ if it encourages behaviours on the part scoring criteria and rubrics, because it makes of teachers and students that promote the salient the issue of construct-underrepresentation learning of valuable skills and knowledge, and and construct-irrelevance variation. allows for issues of transparency and openness, Another difficulty with authentic tasks, with that is to access the criteria for evaluating regard to validity, is concerning rating authentic performance. Encouraging deep approaches to problems. Literature reveals that there is much learning is one aspect, which can be explored in variability between raters in scoring the quality of considering the consequences. Another is the a solution. Construct-underrepresentation in impact which assessment has on teaching. Dochy rating is manifested as omission of assessment

[8.8.2002–12:29pm] [1–128] [Page No. 56] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Education 57

criteria or ideosyncratic weighting of criteria such approach that limits human judgement ‘to single that some aspects of performance do not receive performances’, results of which are then aggre- sufficient attention. Construct-irrelevance var- gated and compared with performance standards. iance can be introduced by rater’s application of extraneous, irrelevant or idiosyncratic criteria (Heller et al., 1998). Suggestions for dealing with CONCLUSION: FUTURE RESEARCH these problems in literature include constructing AND DEVELOPMENTS guidelines, using multiple raters and selecting and training raters. The assessment culture leads to a change in our instructional system from a system that transfers knowledge into students’ heads to one that tries Reliability Related Issues to develop students who are capable of learning how to learn. The current societal and techno- Reliability in classical tests is concerned with the logical context requires education to make such degree in which the same results would be a change. The explicit objective is to interweave obtained on a different occasion, in a different assessment and instruction in order to improve context or by a different assessor. Inter- and education. A number of lessons can be learned intrarater agreement is used to monitor the from the early applications of new assessment technical soundness of performance assessment programs. rating. However, when these conventional criteria First, one should not throw the baby out with are employed for new assessments (for example the bath water. Objective tests are very useful for using authentic tasks), results tend to compare certain purposes, such as high-stake summative unfavourably to traditional assessment, because assessment of an individual’s achievement, of a lack of standardization of these tasks. although they should not dominate an assess- The unique nature of new forms of assessment ment program. Increasingly, measurement specia- has affected the traditional conception of relia- lists recommend the so-called balanced or bility, resulting in the expansion of its scope and pluralistic assessment programs, where multiple a change in weights attached to its various assessment formats are used. There are several components (Birenbaum, 1996). In new assessment motives for these pluralistic assessment programs forms, it is not about achieving a normally (Birenbaum, 1996; Messick, 1984): a single distributed set of results. The most important assessment format cannot serve several different question is to what extent the decision ‘whether or purposes and decision-makers; and each assess- not individuals are competent’ is dependable ment format has it own method variance, which (Martin, 1997). Differences between ratings some- interacts with persons. times represent more accurate and meaningful There is a need to establish a system of measurement than would absolute agreement. assessing the quality of new assessment and Measures of interrater reliability in authentic implement quality control. Various authors have assessment then, do not necessarily indicate recently proposed ways to extend the criteria, whether raters are making sound judgement and techniques and methods used in traditional do not provide bases for improving technical psychometrics. Others, like Messick (1995), quality. Measuring the reliability of new forms of oppose the idea that there should be specific assessment stresses the need for more evidence in criteria, and claim that the concept of construct a doubtful case, rather than to rely on making validity applies to all educational and psycho- inferences from a fixed and predetermined set of logical measurements, including performance data (Martin, 1997). assessment. In line with these views on reliability is Moss’ idea (1992) about reliability. She asserts that a hermeneutic approach of ‘integrative interpreta- References tions based on all relevant evidence’ is more appropriate for new assessment because it American Educational Research Association, includes the value and contextualized know- American Psychological Association, National ledge of the reader, than the psychometric Council on Measurement in Education. (1985).

[8.8.2002–12:29pm] [1–128] [Page No. 57] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 58 Applied Fields: Education

Standards for Educational and Psychological Test- presented to the VCTA Comview Conference, ing. Washington, D.C. Melbourne, 1 December. Archbald, D.A. & Newmann, F.M. (1992). Ap- Kane, M. (1992). An argument-based approach to proaches to assessing academics achievement. In validity. Psychological Bulletin, 112, 527–535. Berlak, H., Newmann, F.M., Adams, E., Archbald, Linn, R.L., Baker, E. & Dunbar , S. (1991). Complex, D.A., Burgess, T., Raven, J. & Roberg, T.A. (Eds.), performance-based assessment: expectations and Toward a New Science of Educational Testing and validation criteria. Educational Researcher, 20(8), Assessment. (pp. 139–180). Albany: State University 15–21. of New York Press. Martin, S. (1997). Two models of educational Birenbaum, M. (1996). Assessment 2000: Towards a assessment: a response from initial teacher educa- pluralistic approach to assessment. In Birenbaum, tion: if the cap fits. Assessment and Evaluation in M. & Dochy, F. (Eds.), Alternatives in Assessment Higher Education, 22(3), 337–343. of Achievements, Learning Processes and Prior Mayer, R.E. (1992) Thinking, Problem Solving, Knowledge. Boston: Kluwer Academic. Cognition (2nd ed.). New York: Freeman. Birenbaum, M. & Dochy, F. (Eds.) (1996). Messick, S. (1984). The psychology of educational Alternatives in Assessment of Achievements, measurement. Journal of Educational Measurement, Learning Processes and Prior Knowledge. 21, 215–238. Boston: Kluwer Academic. Messick, S. (1994). The interplay of evidence and Campione, J.C. & Brown, A.L. (1990). Guided learning consequences in the validation performance assess- and transfer : Implications for approaches to assess- ments. Educational Researcher, 23(2), 13–22. ment. In Frederiksen, N., Glaser, R., Lesgold, A.A. & Messick, S. (1995). Validity of psychological assess- Shafto, M.G. (Eds.), Diagnostic Monitoring of Skill metn. American , 50(9), 741–749. and Knowledge Acquisition (pp. 141–172). Hillsdale, Moss, P.A. (1992). Shifting conceptions of validity in NJ: Lawrence Erlbaum Associates. educational measurement: implications for perfor- De Corte E. (1990). Toward powerful learning mance assessment. Review of Educational Research, environments for the acquisition of problem solving 62(3), 229–258. skills. European Journal of Psychology of Education Moss, P.A. (1995). Themes and variations in validity 5(1), 519–541. theory. Educational Measurement, 2, 5–13. Dochy, F. (1994). Prior knowledge and learning. In Nisbet, J. (1993). Introduction. In OECD-Curriculum Huse´n, T. & Postlethwaite, T.N. (Eds.), Interna- Reform: Assessment in Question, 25–38. Paris: tional Encyclopedia of Education (2nd. ed., Organisation for Economic Cooperation and Devel- pp. 4698–4702). Oxford/New York: Pergamon opment. Press. Nitko, A. (1995) Curriculum-based continuous assess- Dochy, F. & McDowell, L. (1997). Introduction: ment: a framework for concepts, procedures and assessment as a tool for learning. Studies in policy. Assessment in Education, 2, 321–337. Educational Evaluation 23(4), 279–298. Rogers, P. (1995). Validity of Assessments. Contribu- Dochy, F., Segers, M. & Sluijsmans, D. (1999). The use of tion to the 2nd EECAE Conference (European self-, peer and co-assessment in higher education: a Electronic Conference on Assessment and Evalua- review. Studies in Higher Education, 24(3), 331–350. tion), EARLI-AE list, March 10–14. Frederiksen, J.R. & Collins, A. (1989). A system Shavelson, R.J., Xiaohong, G. & Baxter, G. approach to educational testing. Educational (1996). On the content validity of performance Researcher 18(9), 27–32. assessments: centrality of domain-specifications. Glaser, R. (1990). Testing and Assessment; O In Birenbaum, M. & Dochy, F. Alternatives in Tempora! O Mores! Horace Mann Lecture, Uni- Assessment of Achievements, Learning Processes versity of Pittsburgh, LRDC, Pittsburgh, Pennsylva- and Prior Knowledge (pp. 131–142). Boston: nia. Kluwer Academic. Haertel, E.H. (1991). New forms of teacher assessment. In Grant, G. (Ed.), Review of Research in Filip Dochy Education, 17, 3–29. Hall, K. (1995). Co-assessment: Participation of Students with Staff in the Assessment Process. A Report of Work in Progress. Paper given at the 2nd Related Entries European Electronic Conference on Assessment and Evaluation, EARLI-AE list European Academic & Research Network (EARN) (listserv.surfnet.nl/ar- Diagnostic Testing in Educational Settings, chives/earli-ae.html). Evaluation in Higher Education (Includes Heller, J.I., Sheingold, K. & Myford, C.M. (1998). Evaluating Quality in Higher Education), In- Reasoning about evidence in portfolios: cognitive structional Strategies, Learning Disabilities, foundations for valid and reliable assessment. Learning Strategies, Psychoeducational Test Educational Assessment, 5(1), 5–40. Batteries, Reporting Test Results in Education, Hill, P.W. (1993). Profiles and the VCE: Authentic Standard for Educational and Psychological Assessment in a High Stakes Environment. Paper Testing

[8.8.2002–12:29pm] [1–128] [Page No. 58] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Forensic 59

A APPLIED FIELDS: FORENSIC

INTRODUCTION In a single case, all empirically relevant conditions and behavioural variables are checked Psychological forensic assessment aims to contri- for their contributions to the forensic question bute to rational problem-solving in a forensic put to the psychological expert. In order to test context when judgements have to be made about the resulting hypotheses, different sources of conditions or consequences of human behaviour information have to be selected, e.g. according to brought to (criminal or civil) court. We describe a their psychometric properties. Data can be decision-oriented model of the process of psycho- gathered from systematically planned interviews, logical assessment that can serve as a general observation of behaviour, biographical files and framework for psychological assessment concern- standardized procedures (e.g. tests or question- ing forensic questions. Frequently asked forensic naires). Assessors balance the costs of a special questions relate to (1) psychological problems of assessment procedure, e.g. a test, and its benefits. parental custody and contact with children after Of course, they take into consideration not only divorce, (2) credibility of witness statements, and material, but also immaterial costs and benefits (3) prognosis of offence recidivism. for each participant in the assessment process. A competent realization of the assessment plan requires the up-to-date knowledge and skills of a well-trained psychologist. This expert will use the GENERAL CONCEPT most objective methods of documentation, e.g. tape recording of interviews. Modern psychological forensic assessment is Data from all relevant sources of information are conceived as an aid for optimizing forensic problem weighted according to the single case and combined solving in a scientific process of hypotheses-testing. in order to reach a decision about each of the initial The assessment process can be regarded as a hypotheses. In a second step the outcomes of these sequence of decisions. Decisions during planning decisions are integrated, in order to answer the have a crucial impact on assessment results: forensic question(s) posed by the judicial system. mistakes in planning may cause invalid results. The conclusions are always stated as probabilistic Additionally, many decisions must be made while ‘if–then’-statements. realizing the assessment plan and combining the The structure of a psychological report data into results. Explicit rules to aid these according to this assessment process corresponds decisions are explained and compiled in checklists to the international scientific publication stan- by Westhoff and Kluck (1998). dards and the Guidelines for the Assessment This approach is in contrast to the – outdated Process (GAP) of the European Association of – trait-oriented comprehensive ‘portraying’ of the Psychological Assessment (Ferna´ndez-Ballesteros personality. According to this general concept, it et al., 2001): is not the personality that has to be evaluated, but the conditions and the course of a person’s 1 Client’s question (and client) actions, or the relations between individuals, in 2 Psychological questions (¼ hypotheses) the past, present and in the future. There are six 3 Plan and sequence of the investigation sets of conditions influencing human behaviour: (including the names of all investigators, (1) environment; (2) organism; (3) cognition; (4) all appointments; duration and locations of emotion; (5) motivation; and (6) social variables; meetings) and their interactions. 4 Data

[8.8.2002–12:29pm] [1–128] [Page No. 59] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 60 Applied Fields: Forensic

5 Results 3 fostering the development of the child; 6 Recommendations and suggestions (if asked 4 the attitudes of the child to possible solu- for in the client’s question) tions; 7 References 5 parents’ readiness for communication with 8 Appendix (including psychometric calcula- each other regarding the child; tions) 6 their readiness to support the personal 9 Signature (of the responsible psychologist) attachments of the child; 7 strategies of the family to cope with their divorce-related problems. JUDICIAL SYSTEM AND FORENSIC QUESTIONS PUT TO THE The psychological expert has to select the most PSYCHOLOGICAL EXPERT useful, objective, reliable, and valid instruments for gathering the necessary data. There are only very few standardized procedures that match the The roles and the tasks of all the participants in questions asked by the family court. Most of the legal proceedings differ according to the different relevant data for psychological assessment in judicial systems in Western societies. Consequently, family court problems are obtained from the questions put to forensic psychological experts, systematic, partly standardized interviews and and their working conditions, differ as well. from the systematic observation of relevant Nevertheless, there are common basic forensic- behaviour (e.g. ‘the strange situation’ designed psychological concepts and methods. The follow- by Ainsworth et al. (1978) for the assessment of ing sections will deal with them. They will be attachment quality). The Family Relations Test illustrated by sketches of the forensic questions by Bene and Anthony (1985) can be very useful most frequently put to psychological experts. as a supporting instrument for the systematic interviewing of even young children: it helps the Psychological Reports in children to verbalize their incoming and outgoing Family Law emotions about each members of their family. The still widely used projective techniques as well Writing a psychological report on questions of as trait-oriented personality questionnaires are parental custody and contact of parents with their not validated for answering family court ques- children after divorce is a very complex task which, tions: the constitutional right to have or to rear primarily, needs thorough planning. Preparation of children is not limited by a particular degree of such a report aims to support the parents’ readiness any personality trait. Therefore, personality trait of communication with each other and their scores cannot be meaningful criteria in deciding educational competence. The results of the psycho- with which of the parents the children should live logical expert’s work help the judge at the family or whether they should have contact with the court to decide in the ‘best interest of the child’. other party. Psychological experts optimize this assessment process by using explicit rules. Westhoff, Terlinden-Arzt, and Klueber (2000) explain every Statement Credibility single decision that has to be made in this process. In criminal investigation, psychological experts Additionally they give checklists containing rules to may be asked to assess the credibility of help avoid errors and mistakes and to minimize statements by witnesses of a crime. Expert judgement biases. knowledge is mainly required in cases of sexual To enable the parents and/or the judge to decide abuse and maltreatment or other violent crimes, in the ‘best interest of the child’ requires the especially when children are victims and/or operationalization of this hypothetical construct. witnesses of such offences and where there is no The psychological expert has to test the following other evidence than the victim’s/witness’s state- sets of (psychological) variables: ment. Nevertheless, the principal logic and the 1 the personal attachments of the child; basic procedure of conducting an expert assess- 2 the continuity of personal care and the ment is not limited to minors or to particular continuity environment of the child; kinds of crime.

[8.8.2002–12:29pm] [1–128] [Page No. 60] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Forensic 61

The assessment process here is again a details misunderstood, related external asso- hypothesis-testing procedure: starting with the ciations, accounts of subjective mental state; assumption that the statement is not based on a attribution of perpetrator’s mental state; real-life experience of the witness, the expert has 4 motivation-related contents: spontaneous to look out for data that rule out this corrections, admitting lack of memory, hypothesis. Only if there is strong evidence for raising doubts about one’s own testimony, the alternative hypothesis, ‘the statement is based self-deprecation; pardoning the perpetrator; on an experienced real-life event’, can this 5 elements specific to the offence: details alternative hypothesis be accepted. In contrast characteristic of the offence. to this, the presupposition that the alleged event This integrative expert system has experienced- has actually occurred would only need very enhanced theoretical foundation, (Ceci & Bruck, weak supporting evidence to be accepted and 1995). During the last fifteen years many studies would therefore lead to an extremely false- of empirical validation of the system have been positive bias. conducted in field studies and as well as in Assessing the credibility of a witness’s state- experimental studies. The criteria system has ment does not rely on ‘general trustworthiness’ as turned out to be a useful assessment instrument a kind of personality construct, but refers only to for scientific research and for practical use in the assessment of the veracity of the specific assessing the credibility of a witness’s statement. testimony in a particular case. The general Criteria Based Content Analysis (CBCA) can question of credibility assessment can, therefore, only lead to a valid credibility assessment if it be stated as follows: ‘Is this individual witness, takes into account certain characteristics of the under the given conditions of the investigation witness as preconditions for a reliable and valid and the possible influences of other people, testimony. These are perception parameters, capable of making this particular statement even memory conditions and verbalization. In addi- if it is not based on real-life experience?’ tion, there are motivational aspects to be (translated from Steller & Volbert, 1999). considered like readiness to testify, goals, The basic working hypothesis for analysing the expectations, desires and fears connected with content of a witness’s statement was developed giving true or false testimony. by Undeutsch (1967); it says that a statement that Furthermore there must be a test of whether is based on real-life experience differs system- there are or ever have been situational conditions atically from one that lacks this experience. For that influence the statements so that they can credibility assessment, this means that the even be made without that particular experience witness’s statement has to be analysed according in real life. Statements by very young children in to quality criteria applied to its content, which particular are susceptible to inductive and differentiate between reality-based statements and suggestive influences and questions, whether others. Reality criteria have been described since these are intentional or unintentional. Therefore, the beginning of the 20th century in German the ‘history’ of the statement and its development psychological and juridical literature. Undeutsch has to be explored, as well as the cognitive, (1967) was the first to describe a comprehensive emotional, and social developmental status of a set of reality criteria. Steller and Koehnken young child witness. (1989) refer to former approaches proposed by The complete process of credibility assessment several authors and describe a system of five described here is called Statement Validity categories of reality criteria (pp. 221); these are: Assessment (SVA). In 1999, the Federal 1 general characteristics: logical structure, Supreme Court of Germany decided that unstructured production, quantity of details; expert opinions on the credibility of (child) 2 specific contents: contextual embedding; witnesses are not acceptable in forensic contest descriptions of interactions; reproduction unless they meet the standards of an SVA of conversation, unexpected complications (Bundesgerichtshof, 1999). during the incident Appropriate data for testing the above hypoth- 3 peculiarities of content: unusual details, eses for SVA are mainly obtained from biogra- superfluous details, accurately reported phical interviews; psychometric tests would have

[8.8.2002–12:29pm] [1–128] [Page No. 61] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 62 Applied Fields: Forensic

to be selected with regard to their ecological individuals, where a similar constellation of validity for the special aspects of the abilities in conditions is observed. the forensic context mentioned above. While Data for this prognosis task come from severe limitations of sensory perception and prison, hospital or therapy records, from some developmental delays can be easily observed or standardized psychodiagnostic questionnaires assessed by psychometric or otherwise standar- which have proven themselves as being reliable dized methods, an appropriate test of ‘memory’ and valid predictors for criminal recidivism (such for SVA would have to test ‘episodic’ memory; a as the HCR-20 by Webster et al., 1994 and the test of ‘logical thinking’ would have to refer to Level of Service Inventory (LSI-R) by Andrews et ‘understanding social context’. Special tests of al., 1995). Nevertheless, the most important this kind are not yet available. method is the systematic interview with the Consequently, the most important procedure offender based on the topics of the prognosis for gathering data to run an SVA is therefore a criteria. non-suggestive, systematic interview of the wit- ness (for interviewing strategies, see Milne & Bull, 1999). Observation of overt behaviour can OTHER TOPICS OF FORENSIC be helpful in certain aspects, but most nonverbal ASSESSMENT cues (e.g. facial expression or illustrators during speaking) are ambiguous with regard to The three topics of forensic assessment described the veracity of a witness’s statement (Koehnken, here are only examples. In different countries 1990). there are many other forensic questions that are put to the psychological expert. These concern for example: (1) assessment of criminal respon- Prognosis of Offender Recidivism sibility, (2) ‘lie detection’ by psychophysiologial methods, (3) assessment of the effects of Predicting the risk of recidivism of criminal victimisation (4) and (other) special problems in offenders can very much influence the sentence civil law. The structure of the assessment process and – in the case of mentally disordered offenders described above does not differ, however, for any – the kind and duration of correctional treat- forensic question whatsoever put to the forensic ment. This prediction task has to balance the psychological expert. severe consequences of false positive and of false negative judgements, both from the viewpoint of the individual offender and of the community. References Prognoses of offender recidivism are fraught with many specific and difficult problems: Ainsworth, M.D.S., Blehar, M.C., Waters, E. & Wall, S. (1978). Patterns of Attachment: A Psychological absolute certainty cannot be achieved by logical Study of the Strange Situation. Hillsdale, NJ: reason: the available data for prediction are Erlbaum. incomplete; the only data about recidivism risks Andrews, D.A. (1995). The psychology of criminal are those obtained about the individual offender; conduct and effective treatment. In Mc Guire, J. the important situational conditions can only be (Ed.), What Works: Reducing Reoffending (pp. 35– 62). Chichester: Wiley. vaguely rated. Bene, E. & Anthony, J. (1985) Family Relations Test, The process of psychological (and/or psychia- Children’s Version, 1985 Revision. Windsor: The tric) prognosis requires four steps of assessment NFER – Nelson Publishing Co. Ltd. (Rasch, 1999, Dahle, 1999): (1) analysis of the Bundesgerichtshof (1999) ‘Wissenschaftliche Anforder- ungen an aussagepsychologische Begutachtungen former criminal offences of the individual; (2) (Glaubhaftigkeitsgutachten). BGH, Urteil vom assessment of his present mental state (including 30.7.1999 – 1 StR 618/98 (LG Ansbach)’, Neue possible mental disorders or illnesses); (3) Juristische Wochenschrift, 2746–2751. analysis of the psychological development of Ceci, S.J. & Bruck, M. (1995) Jeopardy in the the offender since the latest offence; (4) the Courtroom. Washington, DC: APA. Dahle, K.-P. (1999). Psychologische Begutachtung zur general framework (situations, persons, chances) Kriminalprognose. In Kro¨ ber, H.-L. & Steller, M. of his prospective living conditions. All these (Eds.), Psychologische Begutachtung im Strafverfah- criteria are assessed according to the base rate of ren (pp. 77–111). Darmstadt: Steinkopff.

[8.8.2002–12:29pm] [1–128] [Page No. 62] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Gerontology 63

Ferna´ndez-Ballestros, R., De Bruyn, E.E.J., Godoy, A., der Psychologie. Forensische Psychologie, Band 11 Hornke, L.F., Ter Laak, J., Vizcarro, C., Westhoff, (pp. 26–181). Go¨ ttingen: Hogrefe. K., Westmeyer, H. & Zaccagnini, J.L. (2001). Webster, C., Harris, G., Rice, M., Cormier, C. & Guidelines for the assessment process (GAP): a Quinsey, V. (1994). The Violence Prediction proposal for discussion. European Journal of Scheme: Assessing Dangerousness in High Risk Psychological Assessment, 17(3), 178–191. Men. Toronto: Centre of Criminology, University of Koehnken, G. (1990). Glaubwu¨ rdigkeit. Mu¨ nchen: Toronto. Psychologie Verlags Union. Westhoff, K. & Kluck, M.-L. (1998). Psychologische Milne, R. & Bull, R. (1999). Investigative Interviewing Gutachten schreiben und beurteilen (3rd ed.) Berlin: – Psychology and Practice. New York: Wiley. Springer (1st ed., 1991). Rasch, W. (1999). Forensische Psychiatrie (2nd ed.) Westhoff, K., Terlinden-Arzt, P. & Klueber, A. (2000). Stuttgart: Kohlhammer (1st ed., 1986). Entscheidungsorientierte Psychologische Gutachten Steller, M. & Koehnken, G. (1989). Criteria-based fu¨ r das Familiengericht. Berlin: Springer. statement analysis. Credibility assessment of chil- dren’s statements in sexual abuse cases. In Raskin, Marie-Luise Kluck and Karl Westhoff D.C. (Ed.), Psychological Methods for Investigation and Evidence (pp. 217–245). New York: Springer. Steller, M. & Volbert, R. (1999). Forensisch-aussage- psychologische Begutachtung (Glaubwu¨ rdigkeitsbe- gutachtung), Gutachten fu¨ r den BGH. Praxis der Related Entries Rechtspsychologie, 9, 46–112. Undeutsch, U. (1967). Beurteilung der Glaubhaftigkeit Process (The Assessment Process), Children von Aussagen. In Undeutsch, U. (Ed.), Handbuch Custody

A APPLIED FIELDS: GERONTOLOGY

THE NEED FOR PSYCHOLOGICAL assessment: ‘An attempt to evaluate the most ASSESSMENT OF OLDER ADULTS important aspects of the behaviour, the objective, and the subjective worlds of the person (...)’ (p. Older adults and particularly those frequently 258). Second, we argue for a theoretical frame- described as the ‘oldest old’ (85þ) represent the work to organize the different types of assessment fastest growing population subgroup in most and numerous instruments found in this rapidly (industrialized) countries around the world. evolving field of gerontology. Our suggestion is Although high competence characterizes the to roughly distinguish between three assessment majority of today’s elders (Lehr & Thomae, approaches: (1) Person-oriented (P) assessment is 2000), a whole gamut of critical situations related aimed to address the older person’s cognitive and to ageing, and particularly to very old age, behavioural competence, personality, and psy- underscores the need for psychological assess- chological aspects of health. (2) Environment- ment in older adults. Psychological assessment oriented (E) assessment addresses the social and provides a rational, scientific means for making the physical environment of the ageing person. decisions in these situations, prototypical exam- (3) Finally, the assessment of PE outcomes ples of which are residential decisions (e.g. evaluates the impact of person–environment relocation to an institution or within institutions), transactions on major domains of life quality treatment decisions (e.g. early diagnosis of such as subjective well being, affect, and mental dementia coupled with a promising cognitive health. Below, we use this line of thinking to training intervention), or rehabilitation decisions review psychological assessment in gerontology. (e.g. the estimation an individual’s rehabilitation The challenges of assessing older persons in terms potential and remaining plasticity). of application and theoretical–methodological In order to define the content of this article, issues are discussed shortly thereafter. We end we first draw from Lawton and Storandt this entry with some general conclusions and the (1984), who suggested a broad conception of consideration of future perspectives.

[8.8.2002–12:29pm] [1–128] [Page No. 63] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 64 Applied Fields: Gerontology

MAIN APPROACHES IN THE (neuroticism, extraversion, openness to experi- ASSESSMENT OF OLDER PERSONS ence, agreeableness, conscientiousness; Costa & AND THEIR ENVIRONMENTS McCrae, 1985) remain stable across the adult lifespan. Moderate stability has been widely The following overview draws from both old and confirmed, with a tendency toward lower new treatments of the assessment of older adults stability over correspondingly longer observation (e.g. Kane & Kane, 2000; Lawton & Storandt, periods. From a practical perspective, a recurring 1984; Lawton & Teresi, 1994). Due to space question is whether so-called ‘problem beha- limitations, each theoretical domain is illustrated viours’ (such as antisocial behaviour, health- using a small number of prototypical instruments related risk behaviours, or the nonuse of existing that essentially reflect the construct or family of competencies) may be better explained by constructs in question (see also Table 1). individual differences in personality. In this regard, the NEO Personality Inventory (Costa & McCrae, 1985) is a classic assessment device Person-oriented Assessment that has been used intensively with elders. Cognitive and Behavioural Reservations have to be made regarding the Competence practical utility of these and other personality instruments with respect to the very old and Cognition is a major aspect of behavioural those suffering from mild cognitive impairments; competence which undergoes particular decline in short scales with easily understood items are still the later years. However, two reservations are rare. Besides standardized testing, a careful semi- warranted: First, this is true only for speed- structured exploration of the biography and dependent cognitive abilities (‘fluid intelligence’ in major (and often critical) turning points therein contrast to ‘crystallized’ intelligence); second, is essential for an in-depth understanding of an pronounced interindividual variability in perfor- older person’s current strengths and weaknesses mance is characteristic for old age. To test an (Lehr & Thomae, 2000). individual’s intellectual ability against the norm, In a process-oriented perspective of personality, the well-known Wechsler Adult Intelligence Scale two constructs are particularly useful to explain (WAIS) is a classic in the field of ageing (Wechsler, situation-specific outcomes such as subjective 1981). Also, while there is a high correlation well-being: coping and control. A classic coping between cognitive functioning and the so-called instrument is the Ways of Coping Checklist, which ‘Activities of Daily Living’ (ADL; basic activities has also been proved as useful in a shortened such as eating, washing, or dressing) as well as the version, helpful for assessing the very old ‘Instrumental Activities of Daily Living’ (IADL; (Folkman, Lazarus, Pimley & Novacek, 1987). more complex activities such as preparing For measurement of perceived control, we recom- meals, using the phone, or shopping), a separate mend a short instrument newly developed within assessment of ADL and IADL is nevertheless the context of the Berlin Aging Study (Smith & recommended to afford a comprehensive picture Baltes, 1999; Smith, Marsiske & Maier, 1999). of the everyday competence of the older person. Respective assessment procedures (e.g. the classic Health scale proposed by Lawton and Brody, 1969) have proven to be powerful predictors of institutionali- Gaining clarity on the influences of health zation and mortality. To further complement the impairments is important for psychological evaluation of everyday competence, an additional assessment in any age group. However, this is assessment of leisure activities using an activity list particularly true for older persons. Chronic or diary is helpful (Mannell & Dupuis, 1994). conditions and multimorbidity occur frequently in later life and are among the most influential Personality explanations of subjective well being, depression, and the loss of independence. From a psycholo- There has been some debate in psychological gical perspective, the subjective evaluation of gerontology regarding the question of health based on a single-item assessment (‘How whether personality traits such as the ‘Big Five’ would you rate your overall health at the present

[8.8.2002–12:29pm] [1–128] [Page No. 64] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Gerontology 65

Table 1. Recommendation of assessment instruments for use with older adultsa Assessment domain Prototypical instrument Application issues and selected psychometric informationb Person-oriented assessment Cognitive and Wechsler Adult Intelligence Scale Very widely used; takes about 1.5 hours behavioural (WAIS) to administerc; Cronbach’s alpha of all competence (Wechsler, 1981) subscales >0.70; broad evidence under- lining validity. Activities/Instrumental Activities of Very widely used; takes about 5 minutes Daily Living Scale (ADL/IADL) to administer; Cronbach’s alpha of both (Lawton & Brody, 1969) scales >0.80; inter-rater r 0.61 (ADL) and 0.91 (IADL); broad evidence under- lining validity. Personality NEO Personality Inventory (Costa & Very widely used; takes about 20 min- McCrae, 1985) utes to administer; Cronbach’s alpha of all subscales >0.70; broad evidence underlining validity. Ways of Coping Checklist (Short) Frequently used; takes about 10 minutes (Folkman et al., 1987) to administer; Cronbach’s alpha of sub- scales 0.47–0.74; some evidence under- lining validity. Perceived Control (Smith et al., Instrument introduced in the Berlin 1996d) Aging Study; takes about 10 minutes to administer; some evidence underlining reliability and validity. Health SF-36 (Ware & Sherbourne, 1992) Frequently used; takes about 10 minutes (psychological to administer; Cronbach’s alpha of aspects) subscales 0.57–0.94; some evidence underlining validity. Environment-oriented assessment Social environment Social Networks in Adult Life Survey Frequently used; administration time (Kahn & Antonucci, 1980) depends on persons nominated as social network members; on an average about 30 minutes; reasonable degree of conver- gence between respondents’ and signifi- cant others’ report; some evidence underlining validity. UCLA Loneliness Scale (Russel et al., Frequently used; takes about 10 minutes 1980) to administer; Cronbach’s alpha >0.90; some evidence underlining validity. Burden Interview (Zarit et al., 1980) Frequently used; takes about 10 minutes to administer; Cronbach’s alpha >0.70; some evidence underlining validity. Physical environment The Housing Enabler (Iwarsson, Recently developed instrument; takes 1999) about 1.5 hours to administer; inter- rater reliability mean kappas for the dif- ferent domains assessed 0.68–0.87; some evidence underlining validity. Multiphasic Environmental Frequently used; data-collection time Assessment Procedure (Moos & depends on the size of the institution to Lemke, 1996) be assessed; can take up to about 1 week; Cronbach’s alpha of subscales 0.44–0.96; some evidence underlining validity. (continued )

[8.8.2002–12:29pm] [1–128] [Page No. 65] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 66 Applied Fields: Gerontology

Table 1. Continued Assessment domain Prototypical instrument Application issues and selected psychometric informationb Assessment of personenvironment outcomes Subjective well-being Philadelphia Geriatric Center Morale Very widely used; takes about 10 min- and affect Scale (PGCMS) (Lawton, 1975) utes to administer; Cronbach’s alpha >0.80 (total score); broad evidence underlining validity. Scales of Psychological Well-being Frequently used; takes about 20 minutes (Ryff, 1989) to administer; Cronbach’s alpha of all subscales >0.70; some evidence under- lining validity. Positive and Negative Affect Schedule Frequently used; takes about 5 minutes (PANAS) (Watson et al., 1988) to administer; Cronbach’s alpha >0.70; some evidence underlining validity. Mental health Center of Epidemiological Studies of Very widely used; takes about 10 the Elderly Depression Scale (CES-D) minutes to administer; Cronbach’s (Radloff, 1977) alpha >0.80; broad evidence underlining validity. Mini-Mental-State Examination Very widely used; takes about 10 min- (MMSE) (Folstein et al., 1975) utes to administer; inter-rater r >0.80; broad evidence underlining validity.

aSee also additional description of these instruments in the text. bThe psychometric information given here is based on additional published evidence, which is not explicitly cited in this article due to space limitation. cThe estimation of duration always refers to the administration with old and very old persons. dWe recommend direct contact with the authors of this instrument for more information.

time: excellent, good, fair, or poor’) has proven (Kahn & Antonucci, 1980). This instrument to be a powerful predictor of subjective well- defines social network membership using con- being in many studies. A multi-item assessment of centric circles, an approach that has proven to be this construct as well as other health-related very helpful in differentiating members of the social aspects is provided by the now classic SF-36 network in terms of closeness and importance. (Ware & Sherbourne, 1992). Frequently over- Another well-established tool to assess the existing looked in its impact on everyday life and well- network is the UCLA Loneliness Scale (Russell, being, the assessment of pain and its psychosocial Peplau, & Cutrona, 1980) addressing how often impact is recommended as a must for any the respondent feels isolated and misunderstood comprehensive health evaluation of older adults and wishes to be involved in more social relation- (Parmelee, 1994). ships. Caregivers persons deserve the attention of psychologists as well, given the extensive strain associated with this task and the increased risk of Environment-Oriented Assessment becoming physically and mentally ill. An instru- Social Environment ment for assessing the stress of caregivers is the Burden Interview suggested by Zarit and col- Aspects of the social environment include leagues in the early 1980s (Zarit, Reever, & the objective size of the social network, the Bach-Person, 1980). amount of real and perceived social support, interpersonal conflicts, and overall social network Physical Environment evaluations, such as loneliness. Caregivers are a significant part of elders’ social environments. Physical environments optimally adapted to the A classic instrument to measure social network size needs of frail elders can take on powerful as well as some of its major qualitative character- supportive and stimulating functions in old age istics is the Social Networks in Adult Life Survey (for a review of the according empirical literature,

[8.8.2002–12:29pm] [1–128] [Page No. 66] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Gerontology 67

see Wahl, in press). Gitlin (1998) concluded in A promising assessment tool for use with elders is her review of checklists providing a comprehen- the Positive and Negative Affect Schedule (PANAS) sive assessment of the home environment that the suggested by Watson, Clark, & Tellegen (1988). psychometric properties of most of these devices are at best unclear. Among the rare strictly tested Mental Health instruments, we would recommend the ‘The Housing Enabler’ as a promising tool that Within the spectrum of mental health threats in carefully considers the physical home environ- later life, depression is, besides dementia, the ment as well as the functional profile of older major disease, whose optimal detection requires a persons acting within these environments combination of expertise from clinical psychology (Iwarsson, 1999). Although many different and psychiatry. The Center of Epidemiological suggestions have been tossed around, there is Studies of the Elderly Depression Scale (CES-D) no single device with well-proven psychometric introduced by Radloff (1977) is widely used, has properties currently available. In contrast, the proven psychometric properties, and works well assessment of institutional environments serving in elderly populations. Although a score of 17 is the elderly has found much attention and more widely accepted as an indication of a depressive canalized research efforts. A comprehensive illness, it is wise to always include at least one measurement device is the Multiphasic other source of information (such as a clinical Environmental Assessment Procedure (MEAP), expert rating) before a final diagnostic decision is which is based on a wide-ranging research made. In addition, because severe cognitive program conducted by Moos and associates impairments substantially increase as people age (Moos & Lemke, 1996) and has also been – with estimated dementia rates of about 25% transferred to other countries (e.g. Fernandez- beyond the age of 85 years – dementia Ballesteros et al., 1991). assessment should be included as a routine part of every older person’s clinical evaluation. A classic screening test in this regard is the Mini- Assessment of Person Mental State Examination (MMSE), originally Environment Outcomes suggested by Folstein, Folstein, and McHugh Subjective Well-being and Affect (1975). A major advantage of this widely used device is its scoring system, which is well Subjective well-being, or the cognitive and affective known among clinicians and thus significantly evaluation of the past and present life, has been facilitates communication (a score of 23 is regarded as a major indicator of successful ageing. generally recommended as indicative of cognitive The most highly renowned instrument probably dysfunction). is the Philadelphia Geriatric Center Morale Scale (PGCMS) (Lawton, 1975). This relatively easy-to-use 17-item scale covers three dimensions of subjective well-being, i.e. agitation, satisfaction SPECIFIC CHALLENGES OF with the ageing process, and general life-satisfac- ASSESSING OLDER ADULTS tion. Due to the clinical nature of this instrument with many items addressing negative thoughts and A number of factors can threaten the internal and emotions, it is particularly useful in the clinical, external validity of assessing older persons. In the psychological evaluation of an older person, while following, only a selective overview of these other instruments more thoroughly address the issues can be provided. positive facets of subjective well-being (e.g. Ryff, Two messages are important in terms of practical 1989). test application: on the one hand, old age is Compared to subjective well-being, the measure- associated with a slowing in fine motor functioning ment of affect has not yet found very much and reaction time, the loss of sensory functioning, empirical attention (Labouvie-Vief, 1999). The and cognitive impairment. One consequence of this term ‘‘affect’’ includes emotions, moods, and is that performance tests that require motor feeling states, all of which can be assessed in behaviour may not be adequate, at least in some terms of intensity, frequency, and duration. elderly subpopulations (such as geriatric patients).

[8.8.2002–12:29pm] [1–128] [Page No. 67] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 68 Applied Fields: Gerontology

Furthermore, scales which are normally self- task of future research is, as is so often the case, administered (e.g. personality tests) must fre- replicative research including different subgroups quently be administered by a third person, which of elders and the revision of existing devices in means, as compared to other age groups, a subs- order to improve the critical mass of good tantial change in the social psychology of the test instruments. The assessment procedures so devel- situation, for instance in terms of self-disclosure. oped should provide a broad, reliable, and valid The length of the instrument is particularly critical description of both the positive and negative sides in case of very old persons. Furthermore, the of the ageing individual. response format should remain stable within testing sessions and should be as simple as possible (not more differentiated than a 5-point Likert-type Acknowledgement scale). Also, motivational issues, including fatigue, must be considered when creating optimal test Comments of David Burmedi and Mike Martin circumstances. On the other hand, test strategies on an earlier draft of this entry are very much found to be very effective and economic in younger appreciated. persons, such as phone and computer-based assessment, can, in many cases, be transferred to older people as well. With respect to demented References elders, the use of observational methods is fre- quently the only well-functioning assessment pro- Costa, P.T. & McCrae, R.R. (1985). The NEO- Personality Inventory. Manual Form S and Form R. cedure for evaluating behaviour and inner states. Odessa, FL: Psychological Assessment Resources. A major theoretical-methodological challenge Fernandez-Ballesteros, R. et al. (1991). Evaluation of of assessing older persons is the issue of construct residential programs for the elderly in Spain and invariance. For instance, constructs such as United States. Evaluation Practice, 12, 159–164. depression or pain might have a fundamentally Folkman, S., Lazarus, R.S., Pimley, S. & Novacek, J. (1987). Age differences in stress and coping different semantic at the age of twenty than at the processes. Psychology and Aging, 2, 171–184. age of ninety years. Moreover, measures might Folstein, M.F., Folstein, S.E. & McHugh, P.R. (1975). have age-related characteristics with respect to Mini mental state: a practical method of grading the response bias, response format, or the production cognitive state of patients for the clinician. Journal of missing data. These and other issues as well as of Psychiatric Reseach, 12, 189–198. Gitlin, L.N. (1998). Testing home modification inter- tentative solutions have intensively been ventions: issues of theory, measurement, design, and addressed by Teresi and Holmes (1994). implementation. In Schulz, R., Maddox, G. & To conclude, we urge researchers and practi- Lawton, M.P. (Eds.), Focus on Interventions tioners to adopt an attitude of ‘constructive Research with Older Adults, Vol. 18 (pp. 190–246). caution’ in interpreting and using test results New York: Springer. Iwarsson, S. (1999). The housing enabler: an objective gathered in elderly populations. tool for assessing accessibility. British Journal of Occupational Therapy, 62, 491–97. Kahn, R.L., & Antonucci, T.C. (1980). Convoys over FUTURE PERSPECTIVES AND the life course: attachment, roles, and social support. CONCLUSIONS In Baltes, P.B. & Brim, O.G. (Eds.), Life-span Development and Behaviour (pp. 253–286). New York: Academic Press. The assessment of older persons is an important Labouvie-Vief, G. (1999). Emotions in adulthood. In field of gerontology in terms of research and Bengtson, V.L. & Schaie, K.W. (Eds.), Handbook of application. Due to the multitude of measurement Theories of Aging (pp. 253–267). New York: Springer Publishing. instruments suggested in the gerontological Kane, R.L. & Kane, R.A. (Eds.) (2000). Assessing Older literature, it is essential to carefully check the People: Measures, Meaning and Practical Applica- proven psychometric properties and practical tions. New York, NY: Oxford University Press. usefulness of these devices for making adequate Lawton, M.P. (1975). The Philadelphia geriatric center instrument selections. Standardized tests, semi- morale scale: a revision. Journal of Gerontology, 30, 85–89. structured assessments, and observational meth- Lawton, M.P. & Brody, E.M. (1969). Assessment of ods should serve as complementary tools in any older people: self-maintaining and instrumental activi- comprehensive clinical evaluation. An important ties of daily living. The Gerontologist, 9, 179–185.

[8.8.2002–12:29pm] [1–128] [Page No. 68] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Health 69

Lawton, M.P. & Storandt, M. (1984). Assessment of Unpublished manuscript. Max Planck Institute for older people. In Reynolds, P.M. & Chelune, G.J. Human Development, Berlin. (Eds.), Advances in Psychological Assessment, Vol. 6 Teresi, J.A., & Holmes, D. (1994). Overview of (pp. 236–276). San Francisco: Jossey-Bass. Methodological Issues in Gerontological and Ger- Lawton, M.P. & Teresi, J.A. (Eds.). (1994). Focus on iatric Measurement. In Lawton, M.P. & Teresi, J.A. Assessment Techniques Vol. 14. New York: Spring- (Eds.), Focus on Assessment Techniques, Vol. 14 er. (pp. 1–22). New York: Springer. Lehr, U.M., & Thomae, H. (2000). Psychologie des Wahl, H.-W. (in press). Environmental influences on Alterns [Psychology of ageing] (9th ed.). ageing and behaviour. In Birren, J.E. & Schaie, K.W. Wiebelsheim: Quelle & Meyer. (Eds.), Handbook of the Psychology of Aging (5th Mannell, R.C. & Dupuis, S.L. (1994). Leisure and ed.). San Diego: Academic Press. Productive Activity. In Lawton, M.P. & Teresi, J.A. Ware, J.E., & Sherbourne, C.D. (1992). The MOS 36- (Eds.), Focus on Assessment Techniques, Vol. 14 item short-form healthy survey (SF-36). Medical (pp. 125–141). New York: Springer. Care, 30, 473–483. Moos, R.H., & Lemke, S. (1996). Evaluating Watson, D., Clark, L.A., & Tellegen, A. (1988). Residential Facilities: The Multiphasic Environmen- Development and validation of brief measures of tal Assessment Procedure. Thousand Oaks, CA: positive and negative affect: the PANAS scales. Sage. Journal of Personality and Social Psychology, 54, Parmele, P.A. (1994). Assessment of pain in the elderly. 1063–1070. In Lawton, M.P. & Teresi, J.A. (Eds.), Focus on Wechsler, D. (1981). Wechsler Adult Intelligence Scale Assessment Techniques, Vol. 14 (pp. 281–301). New – Revised Manual. New York: The Psychological York: Springer. Corporation. Radloff, L.S. (1977). The CES-D scale: a self-report Zarit, S.H., Reever, K.E., & Bach-Peterson, J. (1980). depression scale for research in the general Relatives of the impaired elderly: correlates of population. Journal of Applied Psychological Mea- feelings of burden. The Gerontologist, 20, 649–655. surement, 1, 387–393. Russell, D.W., Peplau, L.A. & Cutrona, C.E. (1980). Hans-Werner Wahl and Ursula Lehr The revised UCLA loneliness scale: concurrent and discriminant validity evidence. Journal of Personality and Social Psychology, 39, 472–480. Ryff, C.D. (1989). Happiness in everything, or is it? Explorations on the meaning of psychological well-being. Journal of Personality and Social Psychology, 57, 1069–1081. Related Entries Smith, J. & Baltes, P.B. (1999). Trends and profiles of psychological functioning in very old age. In Baltes, Dementia, Quality of Life, Health, Dynamic P.B. & Mayer, K.U. (Eds.), The Berlin Aging Study. Assessment (Learning Potential, Testing-the- Aging from 70 to 100 (pp. 197–226). Cambridge: Limits), Cognitive Plasticity, Cognitive De- Cambridge University Press. cline/Impairment, Fluid and Crystallized Intel- Smith, J., Marsiske, M., & Maier, H. (1996). ligence, Auto-Biography, Intelligence Through Differences in Control Beliefs from Age 70 to 105. Cohort and Time, Caregiver Burden.

A APPLIED FIELDS: HEALTH

INTRODUCTION correlates of diseases. They identify personality or behavioural antecedents that influence the patho- Health psychology is a field within psychology that genesis of certain illnesses. Health psychologists is devoted to understanding psychological influ- analyse the adoption and maintenance of health ences on health-related processes, such as why behaviours (e.g. physical exercise, nutrition, people become ill, how they respond to illness, how condom use, or dental hygiene) and explore the they recover from a disease or adjust to chronic reasons why people adhere to risk behaviours (e.g. illness, and how they stay healthy in the first place why they continue to smoke or drink alcohol). (Schwarzer & Gutie´rrez-Don˜ a, 2000). Health Health promotion and the prevention of illness are, psychologists conduct research on the origins and therefore, agendas for research and practice, as is

[8.8.2002–12:29pm] [1–128] [Page No. 69] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 70 Applied Fields: Health

the improvement of the health care system in assess the frequency of past behaviour are the most general. commonly used methods. There are numerous In health psychology, a multitude of variables questionnaires that ask for the average or typical are assessed, such as physical conditions, quantity and frequency of alcohol consumption health behaviours, quality of life, coping with (for an overview, see Sobell and Sobell, 1995), stress or illness, coping resources, and premorbid dietary habits, or physical activity. However, the personality. Since health behaviours dominate the information provided by quantity and frequency discipline, the following contribution will focus measures (QF estimates) is limited because respon- on this particular subarea. dents must base their estimates on a large variety of experiences. QF estimates often reflect less drinking and tend to misclassify drinkers compared to daily HEALTH BEHAVIOURS diary or timeline reports. They also provide lower absolute food intake estimates than a longer, Many health conditions are caused by such interviewer-administered diet history. behaviours as problem drinking, substance use, In rare occasions, physiological methods can be smoking, reckless driving, overeating, or unpro- used, which are most accurate for measuring tected sexual intercourse. Health behaviours are alcohol consumption (via blood or urine sampling), often defined as behaviours that people engage in to drug consumption (via immunoassay, hair or sweat maintain or improve their current health and to bioassay procedures), habitual dietary intakes (via avoid illness. They include any behaviour a person biochemical markers), or physical activity (via performs in order to protect, promote, or maintain doubly labelled water). However, such bioassay his or her health, whether or not such behaviours methods are only required when a high level of are objectively effective towards that end (Conner accuracy about recent health behaviour is needed & Norman, 1996; Schwarzer & Renner, 2000). (e.g. for workplace drug testing). They can also be People are inconsistent in the way they practice used in addition to self-report data in order to multiple health behaviours. For example, a person confirm or falsify self-report information (e.g. who exercises regularly does not necessarily adhere about recent drug use). However, in some to a healthy diet. One reason people’s current circumstances it may only be necessary to lead health habits are not more consistent is that they respondents to believe that there is an objective way differ on a number of dimensions (see Table 1). to identify their behaviours via physiological For a valid and reliable measurement of health measures, which is done to reduce misreporting. behaviours, it is essential to distinguish between Another direct method is behavioural observation, these dimensions and to define clearly the subject used to assess physical activity among children or matter under investigation. a driver’s speed, for example. Unstructured or semistructured interviews are qualitative techniques for research on understand- ASSESSMENT OF HEALTH ing individuals’ cognitive and conceptual models of BEHAVIOURS health behaviours and the frames of reference used to organize these behaviours. Therefore, qualitative There are various methods of assessing health methods are mainly concerned with exploration behaviours (Renner, in press). Questionnaires that and analysis of health behaviour because they

Table 1. Dimensions of health behaviours Voluntary; consciously undertaken by Involuntary; unconsciously undertaken the individual by the individual Avoidance of harmful activities Engagement in protective activities Undertaken without medical assistance Needs professional medical assistance Vital Non-vital Occasional; unstable Habitual; stable Simple Complex, multifaceted

[8.8.2002–12:29pm] [1–128] [Page No. 70] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Health 71

allow the interviewee to address the issues that are about the frequency of drinking, the type of relevant to the topics raised by the investigator. drink, and the typical quantity consumed on One major disadvantage of qualitative methods is each occasion. In comparison to questionnaires, that generality is, by definition, not quantifiable. the diary log format minimizes recall biases Furthermore, since anonymity is not given, self- associated with retrospective reporting, but daily reports may be affected by social desirability biases, reporting may be more reactive. In addition, which lead to overreporting of socially desirable diaries could be valuable for getting access to behaviours as well as underreporting of socially so-called ‘intimate’ information (e.g. sexual undesirable behaviours. behaviour). Stone and Shiffman (1994) have labelled Timeline Followback Method Reports (TLFB), strategies for collecting self-reports of respon- developed by Sobell and Sobell (1995), provide dents’ momentary or current state as Ecological a detailed insight into health behaviours (smok- Momentary Assessment (EMA). EMA studies ing, taking drugs, or drinking, etc.) over a usually consist of repeated assessment of designated time period. Participants are asked to participants’ momentary state as they go about provide retrospective estimates of their daily the tasks of daily living in their natural behaviour by using a calendar over a certain environment. Interval-contingent assessments time period, ranging up to 12 months prior to require assessment at regular intervals. One the interview. With this method, the pattern, example is the method of interactive voice variability, and level of drinking or smoking can response where alcoholics are asked to call in be profiled, which is especially useful when on a regular basis to report their drinking status precise estimates are needed or when researchers to the interviewers. Another way is asking wish to evaluate specific changes in health respondents to record every episode of smoking, behaviours before, during, and after interven- eating, or another behaviour of interest. This tions. However, this is a rather time-consuming event-contingent approach may not lead to a method. representative sample of the participant’s general state, and it requires a clear definition of the triggering event. In contrast, signal-contingent BIASES IN SELF-REPORTS sampling supplies participants with an external signal cue that is usually timed to be emitted at Some problems shared by all surveys relying on random to prompt them to complete a written self-reports could seriously decrease internal and assessment or an electronic diary. Signal device external validity (Schwarz & Strack, 1991). beepers, electronic watches, and palmtop com- Short-term fluctuations, such as in substance puters can be used. EMA is a method that use, produced by environmental (e.g. social precisely assesses recent health behaviours. Its settings) and psychological (e.g. mood or stress) major advantage is that it minimizes deviations variables, may affect the psychometric properties due to recall from memory by relying on of usage measures. For example, there is a respondents’ reports of their experience at the tendency for students to become increasingly very moment of inquiry. exuberant as their high school graduation A diary log is a data collection strategy that approaches. Increased party activity during the gathers information as time passes. The dis- spring months contributes significantly to the tinctive feature of this method is that it yields actual level of drug use. Therefore, seasonal information that is temporally ordered. It shows effects and short-term fluctuations may lead to the sequence of events and the profile of actions superficial behavioural changes that could be across time. Diary techniques can be particularly misinterpreted by researchers as being genuine useful when data from the same person are changes. required over a considerable period of time and/ Questions about past behaviours assume accu- or very frequently, such as assessing smoking rate memory of events as well as willingness to behaviour, alcohol consumption, or dietary report them to a researcher. However, respondents habits, in order to provide a general estimate might not recall the actual events, employing of the amounts consumed. For example, alcohol instead various cognitive heuristics (rules of thumb) consumption diaries often include questions to estimate frequencies. This could result in certain

[8.8.2002–12:29pm] [1–128] [Page No. 71] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 72 Applied Fields:

biases. Individuals use different strategies to answer References frequency questions over different time spans. Episodic enumeration (recalling and counting Conner, M. & Norman, P. (Eds.) (1996). Predicting individual incidents) is more likely to be used Health Behaviour: Research and Practice With with shorter time spans in frequency reports, Social Cognition Models. Buckingham, England: whereas rate-based estimation (projecting the Open University Press. Renner, B. (2001, in press). Assessment of health typical rate over the length of the recall period) is behaviours. In Smelser, N.J. & Baltes, P.B. more likely to be used when longer time spans are (Eds.), The International Encyclopedia of the involved. Reported behavioural frequencies for a Social and Behavioural Sciences. Oxford, Eng- year are generally lower than 12 times the land: Elsevier. Schwarzer, R. & Gutie´rrez-Don˜ a, B. (2000). Health equivalent frequencies for a month. People prob- Psychology. In Pawlik, K. & Rosenzweig, M.R. Inter- ably forget more behavioural instances over the national Handbook of Psychology (pp. 452–465). time span of a year than over a month. Therefore, London: Sage. behavioural reports over a month are the more Schwarzer, R. & Renner, B. (2000). Social-cognitive accurate of the two. The use of different time spans predictors of health behaviour: action self-efficacy and coping self-efficacy. Health Psychology, 19(5), across or within studies may lead to inconsistent or 487–495. even misleading results. Schwarz, N. & Strack, F. (1991). Context effects in Accurate and reliable measurements of health attitude surveys: applying cognitive theory to social behaviours, especially drug use and sexual research. In Stroebe, W. & Hewstone, M. (Eds.), activity, have proven to be difficult because of European Review of Social Psychology, Vol. 2, (pp. 31–50). Chichester, England: Wiley. social desirability influences. People underreport Sobell, L.C. & Sobell, M.B. (1995). Alcohol con- smoking and underestimate alcohol consumption. sumption measures. In Allen, J.P. & Columbus, M. Self-reports of alcohol consumption can account (Eds.), Assessing Alcohol Problems (pp. 55–73). for as little as half the amount obtained from NIAAA Treatment Handbook Series 4. Bethesda, sales figures. Likewise, the total number of MD: NIH. Stone, A.A. & Shiffman, S. (1994). Ecological cigarettes sold or otherwise estimated to be momentary assessment (EMA) in behavioural consumed is substantially higher than the medicine. Annals of Behavioural Medicine, 16, estimate calculated from smokers’ self-reports. 199–202. In addition, studies that focus on behavioural frequencies consistently yield illusory superiority: Britta Renner and Ralf Schwarzer respondents report a lower frequency of unhealthy behaviours and higher frequency of Related Entries healthy behaviours for themselves than for an average peer. Illicit problem behaviours, such as Health, Quality of Life, Interview in Behaviour- drug or alcohol use, may elicit stronger self- al Clinical and Health Settings. Brain Activity serving biases than more mundane health- Measurement, Goal Attainment Scaling, Psycho- threatening behaviours in adolescents (for details, physiological Equipment and Measurements, Outcome Assessment/Outcome Treatment. see Renner, in press).

APPLIED FIELDS: A NEUROPSYCHOLOGY

INTRODUCTION fifty years, in the areas of neuroscience in general, and in particular. Neuropsychological assessment as a formal It has also been influenced by developments in procedure is a relatively recent development. Its applied clinical disciplines such as neurology, evolution has paralleled advances, in the past neuroradiology, rehabilitation medicine, special

[8.8.2002–12:29pm] [1–128] [Page No. 72] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Neuropsychology 73

education, geriatrics, developmental psycho- of ‘functional’ (i.e. psychological) origin. This logy, etc. In this section, we review the focus can be attributed to the influence of historical trajectory of this aspect of clinical psychoanalytic thinking, which postulated that neuropsychology, and present the current state psychiatric disturbance could result from intra- of the field. psychic (moral and psychological) and disturbed inter-personal relationships (Hill, 1978: vii). Further, clinicians in the USA and Britain were HISTORICAL ANTECEDENTS formed in a positivist, psychometric culture, which has more readily trusted an actuarial, Neuropsychological assessment did not come of mechanistic approach to data gathering, and age until after the Second World War. In the second statistically driven decision-making algorithms half of the 19th century, there had been a flurry of (Meehl, 1954), while being less comfortable clinical studies that correlated brain structures and with the methodology of single-case studies. cognitive activity. The work of Broca, De´jerine, Thus Ward Halstead’s purpose in designing Jastrowitz, Korsakoff, Lichteim, Liepmann, tests was to determine whether a person had Oppenheimer, Ribot, Wernicke, and many others sustained brain damage or not; asking, ‘more in the latter part of the 19th century described the practically, can convenient indices be found neurological substrates of disorders such as the which, like blood pressure, accurately reflect the aphasias, apraxias, amnesia, and frontal disinhibi- normal and pathological range of variance for the tion (Walsh, 1978; Benton, 2000). However, these individual? Is there a pathology of biological advances in localization of function lay dormant intelligence which is of significance to psychiatry (except in the USSR) for over half a century. This and to our understanding of normal behaviour?’ approach regained its popularity in the 1950s and (Halstead, 1947: 7). He noted accurately that the 1960s, in part as a result of the work of Brenda tests developed by Binet and standardized by Milner and her colleagues in Montreal, who Terman (for the purpose of identifying ‘sub- described the pivotal role of the hippocampus in normal’ children who required remediation in memory (Scoville & Milner, 1957), and in part due school) were completely insensitive to the effects to the work of Benton, Zangwill, He´caen, of brain damage. Citing the work of Hebb and Ajurriaguera, and Goodglass. Sperry’s work and Penfield (1940) he wrote, ‘Evidence is now on the seminal of a human deconnection record to the effect that surgical removal of one syndrome (Geschwind & Kaplan, 1962) lent or both prefrontal lobes – that is, a mass of brain further impetus to the that higher cognitive substance constituting about one-fourth of the functions could be componentialized and subjected total cerebrum – may not significantly alter the to analysis via objective techniques. Interest in the I.Q.’ (Halstead, 1947: 7). pioneering 19th century studies and their potential contribution to the study of brain–behaviour Fixed and Flexible Batteries relationships was revived by Norman Geschwind in Boston at approximately the same time The Halstead–Reitan Battery (Reitan & Davison, (Geschwind, 1997). 1974; Reitan & Wolfson, 1993) and extensions of it (e.g. Heaton, Grant & Mathews, 1991) gained widespread recognition in the USA from PARADIGMS IN the 1950s as the best practice in neuropsycho- NEUROPSYCHOLOGICAL logical assessment, since it provided a means of ASSESSMENT summarizing an array of observations into numerical values that can be compared across Global Measures of patients and situations and, which provide Brain Damage reliable predictions (Boll, 1981; Russell, 1986). This battery (the Halstead–Reitan Battery; At the outset, the primary goal of the neuropsy- Reitan & Davison, 1974) began as a selection chological evaluation in the United States was to of seven tests chosen for their ability to assist in differentiating behavioural disorders of best discriminate patients with frontal versus ‘organic’ (i.e. structural) nature, from those non-frontal or non-injured controls. Currently,

[8.8.2002–12:29pm] [1–128] [Page No. 73] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 74 Applied Fields: Neuropsychology

five of the original seven tests are typically battery approach, espoused by Benton, in which administered to derive an Impairment Index (the standardized tests are selected to assess the proportion of scores in the impaired range), functions most likely to be affected by the together with the Wechsler Intelligence scales, presenting conditions, has come to be preferred memory tests, and other tests of specific by the majority of clinicians in the United States. functions (Lezak, 1995: 709). The five tests include the Categories Test, the Tactile Alternatives to the Psychometric Perceptual Test, the Seashore Rhythm Test, the Approach Finger Oscillation Test, and the Speech-Sounds Perception Test. The psychometric approach has not gone Halstead was fully aware of the view that unchallenged. One of the pillars in the area of prevailed in the 1930’s (and well into the 1960’s) assessment in the USA, Anne Anastasi, exp- that brain dysfunction is unitary (i.e. the notion ressed early concerns about the indiscriminate of equipotentiality). Other tests sensitive to ‘brain use of standardized assessment with diverse damage’ were also available at that time. A well- populations (Anastasi & Cordova, 1953). known example is the Visual Motor Gestalt Test Further, the essential tenet of this approach is (Bender, 1938), now commonly referred to as the that ‘the final solution to a problem, arrived at Bender–Gestalt test. Piotrowski might be credited within a given time, is an objective measure of with developing the first ‘impairment index’ an underlying cognitive mechanism’ (Kaplan, when he stated (in reference to interpretation of 1988: 129). A number of people have taken responses to the Rorschach ink blot test) that, issue with such a premise, pointing to the ‘No single sign alone points to abnormality in the multifactorial nature of the tasks used for psychiatric sense, to say nothing of organic assessment, and the various routes that an involvement of the brain. It is the accumulation individual can take to reach a solution (e.g. of abnormal signs in the record that points to Luria, 1966; Walsh, 1978; Kaplan, 1988). The abnormality’ (Piotrowski, 1937, cited in Lezak, score-based approach to assessment is quite 1995: 773). He considered five signs (out of the different from an attempt to understand brain– ten that he proposed) to be the minimal number behaviour relationships in terms of the way in needed to support an inference of brain damage, which the organism or person interacts with the and noted that the number of signs increased environment to attain a goal, regardless of the with age. Halstead insisted on ‘blind’ adminis- integrity of the nervous system. As early as tration of tests by trained technicians to ensure the mid-1920s, Luria and his mentor Vygotsky objectivity of results, although his qualitative in the USSR had decided that the best approach observations were based on an impressive variety to understanding higher cognitive functions was of sources. The use of cut-off scores (usually one two-pronged: to study their normal development and a half or two standard deviations from the on the one hand, and their ‘decomposition’ in mean, indicating impairment) and an Impairment brain-damaged individuals on the other. Index (the number or proportion of tests on Vygotsky felt that the earlier work of the 19th which the patient’s score equals or exceeds the century neurologists was limited by the absence cut-off) as applied to the Halstead–Reitan battery of an adequate theory of psychology (Luria, (Reitan & Davison, 1974) attests to the influence 1979). Luria and his followers emphasized an of then prevalent theories of brain function on analysis of performance based on the belief that neuropsychological test interpretation. Nonethe- behaviours are the result of functional brain less, both Halstead and later Reitan rejected the systems that interact with each other. Thus, a notion that brain function is unitary, based on function could be subserved by various sub- the fact that patients with lesions in different systems, and difficulty in performing a task areas produced different patterns of performance could be the result of a breakdown in any of on the tests (Halstead, 1947; Reitan & Davison, those mechanisms. Conversely, compensatory 1974). Over time, there was recognition that routes engaging alternate subsystems can some- identifiable neurological syndromes exist, and times be utilized to achieve the same goal. This rather than apply a fixed battery of tests to approach was particularly relevant to the everyone, regardless of the diagnosis, a flexible rehabilitation of individuals who sustained

[8.8.2002–12:29pm] [1–128] [Page No. 74] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Neuropsychology 75

brain damage during World War II. Analysis of problem-solving abilities. The importance of the compensatory strategies that are or can be reviewing the records, obtaining a comprehensive brought into play to reach a goal (that is, an history, family interviews, and an analysis of analysis of the different circumstances that elicit the person’s goals and behaviour across or inhibit a given behaviour) provides a basis for different settings and over time, provide a more intervention that can enhance the individual’s contextualized understanding of the individual as success. Largely for this reason, Luria’s approach a whole, and better insights into how recommen- to neuropsychological assessment has been dations can be realistically formulated widely adopted in rehabilitation centres through- (Armengol, Kaplan, & Moes, 2001). Attention out the world (e.g. Caetano & Christensen, to the role and possible impact on testing of 1997). His work has had a wide-ranging impact medication, pain, physical limitations, and mental in neuropsychological practices and assessment status (including neurovegetative functions such in many countries. as sleep, appetite, sensorimotor changes, and mood) is essential. Evolving Procedures and Roles for Technological breakthroughs in the field of the Neuropsychologist , specifically the advent of the CT scan in the early 1970s, and more recently with Christensen (1979) attempted to systematize technologies that allow visualization of areas of Luria’s approach to testing in order to make his brain activation (such as funtional Magnetic procedures more accessible to a wider audience Resonance Imaging (fMRI) and PET/SPECT and to present stimuli in a format and sequence scanning), along with the availability of more consistent with Luria’s conceptualization of sophisticated neuropsychological evaluation pro- cortical functions. In the United States, this cedures in clinical settings, has gradually approach was assimilated within a quantitative changed the focus and role of neuropsychologi- scoring framework by Golden and his colleagues, cal assessments. No longer is lesion localization and is now known as the Luria-Nebraska the primary aim; rather, it has shifted in the Neuropsychological Battery (Golden, Purisch, & direction of describing and understanding the Hammeke, 1981). This battery is rarely in use functional consequences and rehabilitation impli- today, as it has been widely criticized on a cations of brain dysfunction. An important number of both conceptual and methodological exception to this in the USA has been the area grounds (Lezak, 1995). The publication in 1976 of forensic neuropsychology, where the focus of Lezak’s ‘Neuropsychological Assessment’ (now continues to be on establishing the presence of in its 3rd edition), which describes and reviews structural brain damage following injury, with many tests, as well as syndromes, provided an its functional and prognostic implications. This important resource to the field. One of the is particularly a concern in cases of minor head legacies of Luria’s conceptualization of a hier- injury, where neuroimaging is likely to be archy of cognitive abilities has been the need to unhelpful and where the potential for malinger- separate the impact of primary on secondary ing is inevitably raised. This has led to interest functions (e.g. the need to assess activation and in measures designed to detect deception (if only attention as they relate to memory and other to be able to preemptively refute the assertion of higher mental processes). An important distinc- malingering in the majority of cases), as well as tion must be made, especially in clinical practice, an appreciation for the need to take into account between psychometric testing (which in many the baseline incidence in the normal population clinics is performed by technicians) and neuro- of symptoms and patterns of test scores, in order psychological assessment (which involves the to be able to establish the presence or absence of interpretation and integration of information pathology. regarding the patient). A comprehensive neuro- In light of relatively new standards for psychological evaluation will, at a minimum, presenting evidence in courtrooms (i.e. the address basic attentional, linguistic, visuopercep- Daubert rule of 1993), clinical neuropsycholo- tual and visuoconstructional, motor, learning gists have had to rely on standardized instru- and memory, calculations, sequencing, executive ments (rather than clinical experimental and emotional functions, social interactions, and techniques) to document changes in functioning.

[8.8.2002–12:29pm] [1–128] [Page No. 75] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 76 Applied Fields: Neuropsychology

Over the years, within the experimental tradition Adult Intelligence Scale as a Neuropsychological of , investigations of selective Instrument or WAIS-RNI, and the Wechlser deficits in individuals with brain lesions led to the Intelligence Scale for Children as a Process identification of discrete components of complex Instrument or WISC-III PI); (b) the addition of functions, as well as the development of ingenious new indices to score existing data that allow for and elegant laboratory procedures to demonstrate better capture of relevant process variables (e.g. disconnections, levels of processing, and double new methods to score the Rey–Osterrieth Complex dissociation of functions (e.g. Warrington, 1982; Figure drawings, as developed by Stern et al., Shallice, 1988; McCarthy & Warrington, 1990; 1995); and (c) a conceptual reanalysis of perfor- Gazzaniga, 1995). Experimental paradigms that mance on existing tests based on alternative have been used with lesioned non-human animals theoretical models (see Poreh 2000, for examples have also been applied in research and clinical of this last approach). Poreh (2000) refers to this settings to see if brain–behaviour relationships new trend as the ‘Quantified Process Approach’. established for other species can be successfully One of the potential advantages of computerized applied to the study of humans. A good example is approaches to assessment is the ability to capture the use of delayed object alternation tasks with sequential qualitative aspects of performance, individuals who have sustained prefrontal damage although this potential remains largely unfulfilled (e.g. Oscar-Berman, McNamara, & Freedman, at this time. 1991).

FUTURE PERSPECTIVES AND Current Trends CONCLUSIONS Edith Kaplan, who was trained by the developmental psychologist Heinz Werner, has Neuropsychological assessment is central to formulated and championed a process approach to attempts to understand the biological bases of neuropsychological assessment. ‘For Werner behaviour. Even as our technology becomes more (1956)’ every cognitive act involves ‘‘microgenesis’’ sophisticated and we unravel genetic codes, (i.e. an ‘‘unfolding process over time’’). Thus close behavioural functions must be mapped, and observation and careful monitoring of behaviour behavioural and cognitive markers for particular en route to a solution (process) is more likely to syndromes and disorders become more relevant. provide more useful information than can be Structural and functional in vivo neuroimaging obtained from right or wrong scoring of final techniques provide exciting opportunities to products (achievement)’ (Kaplan, 1988). The examine patterns of brain activation during the Boston Process Approach, as it is known, attempts performance of tests and induced psychological to bridge the case study method (grounded in an states. Neuropsychological assessment must keep understanding of neuropsychological syndromes) pace with the new demands imposed by developed by Luria on the one hand, and the focus technological advances and limitations. Tests on the need for replicable, empirical, and norma- adapted for presentation during fMRI are good tively standardized data, on the other. This has examples of the latter (e.g. Whalen et al., 1996). been pursued in several ways. Following up on In the immediate future, the greater use of developments in cognitive neuroscience, new tests computerized technologies will open possibilities such as the California Verbal Learning Test (Delis for more naturalistic assessment, the evaluation et al., 1987) and the Delis–Kaplan Executive of more complex behaviours, and the ability to Function System (Delis, Kaplan & Kramer, collect a wide sample of measures, including the 2001), were developed to better assess aspects of incorporation of physiological measures, conco- learning and executive function which are found to mitantly with performance of various activities. differ among patients with different neuropsycho- One area with particular promise for assessment logical disorders. This approach has also included and rehabilitation is the developing field of (a) the addition of standardized procedures to virtual reality (Riva, 1997). Neuroimaging has existing tests to assist in clarifying the process also permitted an analysis of brain functioning in underlying a patient’s response (e.g. the Weschler individuals who differ in terms of the ecological

[8.8.2002–12:29pm] [1–128] [Page No. 76] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Neuropsychology 77

demands placed upon them, such as illiterates Gazzaniga, M.S. (1995). The Cognitive Neurosciences. and bilingual subjects (e.g. Castro-Caldas et al., Cambridge: Massachusetts Institute of Technology. Geschwind, N. (1997). Selected Writings. In Devinsky, 1998; Perani et al., 1998). The finding that D.O. & Schachter, S.C. (Eds.), Norman Geschwind: structural and functional differences emerge Selected Publications on Language, Epilepsy, and under different environmental circumstances Behaviour. Boston: Butterworth-Heinemann reinforces the need to take into account issues Geschwind, N. & Kaplan, E. (1962). A human cerebral relating to ecological validity. That is, tests that deconnection syndrome. Neurology, 12, 675–685. Golden, C.J., Purisch, A.D. & Hammeke, T.A. (1985). have been developed for one population may Luria-Nebraska Neuropsychological Battery: Forms have limited validity when administered to a I and II. Los Angeles, CA: Western Psychological different population (this certainly applies to Services. populations in different stages or trajectories of Halstead, W.C. (1947). Brain and Intelligence: A development). Similarly, results that are obtained Quantitative Study of the Frontal Lobes. Chicago, IL: The University of Chicago Press. under one set of circumstances (e.g. the clinic or Heaton, R.K., Grant, I. & Mathews, C.G. (1991). research laboratory) may not generalize to other, Comprehensive Norms for an Expanded Halstead- more typical daily tasks and situations. Clearly Reitan Battery: Demographic Corrections, Research there is much work to be done in this area. Findings, and Clinical Applications. Odessa, FL: Psychological Assessment Resources. Hill, D. (1978) Forward to the First Edition of Lishman, W.I, Organic Psychiatry; The Psychologi- References cal Consequences of Cerebral Disorder. Oxford, England: Blackwell Scientific Publications. Anastasi, A. & Cordova (1953). Some effects of Kaplan, E. (1988). A process approach to neuropsy- bilingualism upon the intelligence test performance chological assessment. In Boll, T. & Bryant, B.K. of Puerto Rican children in New York. Journal of (Eds.), Clinical Neuropsychology and Brain Func- , 44, 1–19. tion: Research, Measurement, and Practice (pp. 129– Armengol, C.G., Kaplan, E. & Moes, E.J. (2001). The 167). Washington, DC: American Psychological Consumer-Oriented Neuropsychological Report. Association. Odesssa, FL: Psychological Assessment Resources. Lezak, M.D. (1995). Neuropsychological Assessment Benton, A. (2000). Historical aspects of cerebral (3rd ed.). New York: Oxford University Press. localization. In Riva, D. & Benton, A. (Eds.), Lishman, W.A. (1977). Organic psychiatry. Oxford, Localization of Brain Lesions and Developmental England: Blackwell Science. Functions (pp. 1–14). London, England: John Luria, A.R. (1966). Higher Cortical Functions in Man. Libbey. New York: Basic Books. Bender, L. (1938). A visual motor gestalt test and its Luria, A.R. (1979). The making of mind. In Cole, M. & clinical use. American Orthopsychiatric Association, Cole S. (Eds.), The Making of Mind: A Personal Research Monographs, No. 3. Account of Soviet Psychology. Cambridge: MIT Press Boll, T. (1991). The Halstead-Reitan Neuropsycholo- McCarthy, R.A. & Warrington, E.K. (1990). Cognitive gical Battery. In Filskov, S.B. & Boll, T.J. (Eds.), Neuropsychology: A Clinical Introduction. San Handbook of Clinical Neuropsychology. New York: Diego, CA: Academic Press. Wiley-Interscience. Meehl, P.E. (1954). Clinical versus Statistical Predic- Caetano, C. & Christensen, A.L. (1997). The design of tion. Minneapolis: University of Minnesota Press. neuropsychological rehabilitation: the role of neu- Oscar-Berman, M., McNamara, P. & Freedman, M. ropsychological assessment. In Leon-Carillon, J. (1991). Delayed response tasks: parallels between (Ed.), Neuropsychological Rehabilitation: Funda- experimental ablation studies and findings in mentals, Innovations, and Directions. Delray Beach, patients with frontal lesions. In Levin, H.S., FL.: St. Lucie Press. Eisenberg, H.M. & Benton, A.L. (Eds.), Frontal Castro-Caldas, A., Petersson, K.M., Reis, A., Stone- Lobe Function and Dysfunction. New York: Oxford Elander, S. & Ingvar, M. (1998). The illiterate brain. University Press. Learning to read and write during childhood Perani, D., Paulesu, E., Sebastian Galles, N., Dupoux, influences the functional organization of the adult E., Dehaene, S., Battinardi, V., Cappa, S., Fazio, F. brain. Brain, 121, 1053–1063. & Mehler, J. (1998). Brain, 121, 1841–1852. Christensen, A.-L. (1978). Luria’s neuropsychological Poreh, A. (2000). The quantified process approach: an investigation (2nd ed.). Copenhagen: Munksgaard. emerging methodology to neuropsychological assess- Delis, D.C., Kramer, J.H., Kaplan E. & Ober, B.A. ment. The Clinical Neuropsychologist, 14, 212–222. (1987). The California Verbal Learning Test Reitan R.M. & Davison, L.A. (1974). Clinical Manual. San Antonio: Psychological Corporation. Neuropsychology: Current Status and Applications. Delis, D.C., Kaplan, E. & Kramer, J.H. (2001). Delis- Washington, D.C.: V.H. Winston & Sons, Inc. Kaplan Executive Function System. San Antonio, Reitan R.M. & Wolfson, D. (1993). The Halstead- TX: The Psychological Corporation. Reitan Neuropsychological Test Battery: Theory and

[8.8.2002–12:29pm] [1–128] [Page No. 77] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 78 Applied Fields: Organizations

Clinical Interpretation. Tucson, AZ: Neuropsychol- Warrington, E., (1982). The fractionation of arithme- ogy Press. tical skills: a single case study. Quarterly Journal of Riva, G. (1997). Virtual reality in neuro-psycho- Experimental Psychology, 34A, 31–51. physiology: cognitive, clinical and methodological Whalen, P.J., Bush, G., McNally R.J., Sabine, W., issues in assessment and treatment. Studies in Health McInerney S.C., Jenike, M.E. & Rauch, S.L. (1998). Technology and Informatics, Vol. 44. Amsterdam: The emotional counting stroop paradigm: a IOS Press. functional magnetic resonance imaging probe of Russel, E.W. (1986). The psychometric foundation of the anterior cingulated affective division. Biological clinical neuropsychology. In Filskov, S.B. & Boll, Psychiatry, 44(12), 1219–1228. T.J. (Eds.), Handbook of Clinical Neuropsychology. New York: John Wiley & Sons. Carmen Armengol de la Miyar, Shallice, T. (1988). From Neuropsychology to Mental Elisabeth J. Moes and Edith Kaplan Structure. New York: Cambridge University Press. Scoville, W.B. & Milner, B. (1957). Loss of recent memory after bilateral hippocampal lesions. Journal of Neurology, Neurosurgery, and Psychiatry, 20, 11–21. Stern, R.A., Singer, E.A., Duke, L.M., Singer, N.G., Related Entries Morey, C.E., Daugherty, E.W. & Kaplan, E. (1995). The Boston qualitative scoring system for the Rey- Memory Disorders, Visuo-Perceptual Impair- Osterrieth complex figure: description and inter- ments, Executive Functions, Voluntary Move- rater reliability. Clinical Neuropsychologist, 3, ments, Traumatic Brain Injury, Equipment for 309–322. Assessing Basic Processes, Neuropsychological Walsh, K. (1978). Neuropsychological Assessment: A Test Batteries, Outcome Measures in Neuropsy- Clinical Approach. New York: Churchill Livingstone. chological and Brain Injury Rehabilitation.

APPLIED FIELDS: A ORGANIZATIONS

INTRODUCTION using scientific instruments. The primary objective of this assessment is to describe the organization as Psychologists interested in describing, diagnosing an individual and collective behaviour system or changing organizational behaviour are com- accurately. However, psychological measures pelled to assess psychological properties of should be also relevant in terms of practical organizations at some stage of their work. It is implications, serving the purpose of helping for this reason that, as in other applied fields, managers and other members of the organizations multiple approaches and techniques concerning to make decisions. psychological assessment have been developed Traditionally, psychological assessment in and used in organizations. The present article organizations has been restricted to the measure aims to describe a multilevel psychological of individual differences, implicitly assuming that assessment, adopting a social systems perspective. organizational effectiveness is the result of the To this end, we define psychological assessment aggregation of the psychological characteristics of of organizations, analyse how it is implemented individuals. This individual level of analyses, at different levels, and present future perspectives. however, is limited and the measurement of the work group and the organization as a whole offer a complementary and more comprehensive CONCEPT AND OBJECTIVES OF assessment. Psychological properties exist at ORGANIZATIONAL ASSESSMENT different levels of analyses and all of them contribute to the effectiveness. Thus, a multilevel Psychological assessment of organizations refers to assessment is needed in obtaining a deeper the measure of human behaviour in organizations description of the organizations.

[8.8.2002–12:29pm] [1–128] [Page No. 78] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Organizations 79

MAIN TOPICS IN PSYCHOLOGICAL have been developed in order to improve the ASSESSMENT AT DIFFERENT validity of measures. This is the case of inter- LEVELS OF ANALYSES personal skills, which are especially critical in customer service jobs, work groups, and leader- The Individual Level ship. Also, job knowledge and tacit-knowledge measures are closely related to specific job There is a persistent interest in the study of performance criteria. For instance, subjects can be individual experiences in organizations and exposed to a job-related situation, and their continuously there are emerging topics and capabilities to solve problem situations can be controversies (Nord & Fox, 1996). Personality, measured through assessment centre procedures. cognitive, affective, and behavioural variables have been assessed during decades. With this in mind, the most relevant issues currently asso- Individual Performance ciated with the measurement of individuals in Production (e.g. quantity) and other employee organizations are summarized in this section. behaviour records (e.g. absenteeism) are used as objective direct or indirect measures of individual Personality performance. Also, subjective evaluations from individuals familiar with the work of the focal Individuals can be characterized by a number of person are considered (e.g. 360 feedback). These enduring dispositional properties, which help to performance indicators are the result of task and understand people’s behaviours in Organizations. contextual performance. The first is defined as One of the most popular methods of assessing the proficiency with which subjects perform core personality is derived from the big five theory. technical activities of well-defined jobs. Thus, Through self-report inventories, five dimensions cognitive abilities are relevant for predicting task of personality are measured: (1) extraversion; performance. In contrast, contextual performance (2) emotional stability; (3) agreeableness; (4) is defined as extra-task proficiency that con- conscientiousness; and (5) openness to experi- tributes more to the organizational goals, includ- ence. Several authors prefer the use of a ing aspects such as enthusiasm and volunteering composite of several big five constructs, labelled to make duties not formally part of one’s job. It integrity test, because this broader measure can is assumed that personality variables are critical be more reliable in predicting overall job for predicting contextual performance criteria performance. However, narrower trait constructs (Arvey & Murphy, 1998). can show better prediction of specific job performance criteria within specific occupations (Gatewood, Perloff, & Perloff, 2000). Work Attitudes Work attitudes are defined as positive or negative Knowledge, Skill, and Abilities (KASs) evaluations about aspects of one’s work environ- ment (O’Reilly, 1991). The most common con- KASs are defined, respectively, as the amount of structs measured by attitude instruments are job factual information known by an individual, his/ satisfaction, commitment, involvement, and stress. her conduct of job specific activities, and his/her Satisfaction refers to a emotional state resulting conduct of generalized job activities. With respect from job experiences. The questionnaires used to to the abilities, different goals are associated with measure job satisfaction can be classified in two the measure of general mental ability or ‘g’ versus groups: measures of overall satisfaction and specific abilities. Although there is some consensus measures of satisfaction with specific aspects of about the predictive efficiency of the ‘g’ factor, the job (Peiro´ & Prieto, 1996). The most frequently measures of specific abilities tend to be more useful measured facets are satisfaction with pay, promo- when the goals are understanding people’s beha- tion, supervision, and job content (Gatewood et al., viours or their classification. Given that abilities, as 2000). With regard to commitment, there is no they are measured by aptitude tests, refer to a wide generally accepted definition and measurement. and general range of human experiences, more While affective commitment measures include circumscribed measures of skills and knowledge aspects such as loyalty towards the organization,

[8.8.2002–12:29pm] [1–128] [Page No. 79] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 80 Applied Fields: Organizations

the effort to achieve organizational goals, and the inclusion, and affection characterize effective acceptance of organization’s values, continuance groups. commitment measures are related to the personal sacrifice associated with leaving the organization Group Process and the perceived employment alternatives. Finally, another measure of work attitudes refers to the It is generally assumed that, in addition to group degree to which the job experiences are perceived as design, the process of interaction among group stressful. However, caution is needed because self- members affects the effectiveness of the group as report measures of stress may be easily inflated by a whole. As Hackman (1987) pointed out, the person’s disposition toward negative affectivity assessing group process can pursue different (O’Reilly, 1991). goals. A trained observer can focus on the interpersonal transactions that reflect conscious The Group Level and unconscious social and emotional forces (e.g. who is talking with whom). Group process The work group consists of individuals who see assessment can also be focused on the issues of themselves and who are seen by others as a social interaction directly related to work of group on entity, who are interdependent because of the tasks its task (e.g. the degree to which knowledge and they perform as members of a group, who are skill members are used). Group interaction can embedded in one or more larger social systems (e.g. result in ’synergy’; that is, outcomes that are organization), and who perform tasks that affect different from those that would be obtained by others (Guzzo & Dickson, 1996). Psychological simply adding up the contributions of individuals assessment at group level is primarily focused on (Hackman, 1987). Synergy can be positive (e.g. a three aspects: design, processes, and performance. very creative solution of a job-related problem) or negative (e.g. a severe failure of coordination). Group Design In general, different methods can be used to assess group process. It is the case of some Although a good group design cannot guarantee a assessment centre techniques (e.g. simulation), satisfactory group functioning, it is necessary to where real job tasks are represented and a group facilitate competent group behaviours. It is for this of individuals is assessed by a group of judges. reason that group design should be measured and controlled. Of the different facets of group design Group Performance (structure of task, group composition, and estab- lishment of norms), composition of group has Three criteria are typically used to measure group received increasing attention, especially heteroge- performance: (1) group-produced outputs, (2) the neity (Guzzo & Dickson, 1996). Group hetero- influences of group for its members, and (3) the geneity refers to the mix of abilities, personalities, state of the group as a performing unit (see gender, attitudes, background, and demographic Guzzo & Dickson, 1996; Hackman, 1987). characteristics. In order to work effectively, a ‘right Although some objective indicators of group mix’ of group members is needed. Efforts have been outputs can be measured (e.g. quantity), objective devoted to assess the right mix of members in terms criteria are only available for a restricted number of abilities and personality (West & Allen, 1997). It of work groups in organizations. In general, the is the case of ‘skill mix’, particularly popular within assessment from others (e.g. a manager) is more teams in health service setting, which refers to the critically associated with the consequences for a efficient balance between trained and untrained, group and its members than objective measures. qualified and unqualified, and supervisory and It is for this reason that there is a tendency to operative staff. Also, personality compatibility can assess outputs in terms of satisfaction of the be measured. For instance, according to the standards of the people (‘clients’) who receive Schutz’s theory of fundamental interpersonal and/or review the output. The second measure is relations orientations (FIRO) there are three basic related to the impact of group on individuals. It is needs expressed in group interaction: needs for assumed that the cost of generating group inclusion, control, and affection. A compatible outputs is high if its members are dissatisfied. balance of initiators and receivers of control, Accordingly, the degree to which the group

[8.8.2002–12:29pm] [1–128] [Page No. 80] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Organizations 81

experiences satisfy the needs of group members 1994). In the absence of alternatives, short-term should be also assessed. The third measure financial gains are usually used as indicators of reflects the probability that a group performs long-term prospects. However, the strategies effectively in the future. Although the present devoted to increase long-term performance often outputs of a group can be satisfactory, it is diminish short-term earnings. The myopic man- possible that the social processes by which these agement style, focused on short-term gains, can outputs are obtained hamper the group as a be corrected by considering non-financial mea- performing unit, and its members are not willing sures. In fact, the measurement of customer anymore to work together on future tasks. perceptions of product quality is able to predict information concerning long-term competitive- The Organization Level ness that is not captured by short-term financial measures (see Aaker & Jacobson, 1994). Individuals and work groups are embedded in a more general organizational system that can be measured itself. Psychological properties of FUTURE PERSPECTIVES organizations as whole, such as culture, climate, and performance can also be assessed. An Integrated Assessment of Organizations Culture and Climate In the preceding discussion, we have analysed how Although culture and climate have been sometimes the psychological assessment is implemented at used as synonyms, they refer to different concepts. different levels of organizations. However, a more As Schneider (1985) pointed out, culture is a deeper integrated perspective can be considered where the construct than climate has been. While organiza- different levels of analyses are interrelated showing tional climate is defined as the shared perceptions complex interactions. Herriot and Anderson of employees related to the practices, procedures, (1997) proposed that the relationships between and behaviours that are rewarded and supported in measures at individual, group, and organizational an organization, culture refers to the beliefs, norms, levels of analyses could show three kinds of and values underlying the policies and activities, as patterns: complementary, neutral, and contra- well as the manner in which the norms and values dictory. The complementary interaction is systems are communicated and transmitted. observed when a high score at one level of analysis Consequently, the modes by which culture and is desirable in combination with a high score at climate are assessed are also different. Culture is another level (e.g. when high interpersonal skills usually measured by using qualitative and case are required for both individual work and group study methodologies. In contrast, the survey working). The neutral interaction occurs when a approach is the dominant method in measuring high score on a construct is desired at one level and, climate (Schneider, 1985). simultaneously, it is not applicable at another level of analysis (e.g. when interpersonal skills are Organizational Performance required for group working, but they are not related to individual performance). Finally, the Financial performance and productivity are contradictory interaction is observed when a high considered as the typical measures of organiza- score at one level of analyses is desirable in tional performance as a whole. In addition, other combination with a low score at another level (e.g. measures associated with customer responses of when extraversion is desirable for team working, satisfaction and perceived quality have received but introversion is positively related to individual increasing attention. While economic measures of performance). Because of its relevance to research performance reflect quantity of outputs, psycho- and management, future efforts are needed in logical measures of customer evaluations refer to developing and testing these kinds of approaches. quality of outputs as they are perceived by the An integrated assessment is able to describe an customer (Fornell, 1992). Psychological measures organization more accurately, given that it serves offer information that is not included in current- to diagnose their complex and contradictory term financial measures (Aaker & Jacobson, character.

[8.8.2002–12:29pm] [1–128] [Page No. 81] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 82 Applied Fields: Organizations

Links Between the Context of organizational realities. The political face is not Organizations and Psychological everything, but it serves to understand some events Assessment related to psychological assessment. Additionally, the ignorance of power in organizations can result It is generally assumed by managerial orienta- in managerial failures and incomplete assessment at tions that organizations are free in order to different levels of analyses. For instance, there is not design and implement practices and policies (see only a dominant culture in organizations but also Morishima, 1995). However, the external context ‘countercultures’ that reflect alternative values. of organization impacts on the organizational Usually, individuals and work groups that have choices, including the type of procedures and values and perceptions congruent with those of techniques used in the psychological assessment. organizations, especially with the top-management For instance, Rousseau and Tinsley (1997) group, also have more power and influence suggested that the culture of a country (e.g. in (Friedlander, 1987). Accordingly, it is reasonably terms of individualism vs. collectivism) can be to expect that divergent thinking will not be related to the appropriateness of individual versus reflected in the measurement of culture. Also, group measures of performance, as well as to the psychological assessment is likely to be used to emphasis on individual-job versus individual- reinforce and justify the values and perceived tasks organization or work-group fit measures. Also, of the dominant coalition. Powerful coalitions act Herriot and Anderson (1997) indicated that within their own reality, which is not necessarily organizations are now subjected to an environ- better than other realities constructed within the ment that changes with an increasing speed and organization as a whole. Alternative cultures can unpredictability. In this context, organizations reflect adaptive values in terms of initiative and emphasize the psychological assessment related to creativity. The ignorance of these cultures has employee flexibility, personality, and potential to contributed to long-term disasters in many com- innovate. Additionally, it is also likely to expect panies (see Dachler, 1989). Thus, more effort is that, in some circumstances, organizations impact needed in order to include the diversity of on their external context. For instance, organiza- organizational ‘cultures’ in psychological assess- tions can demand an education system in which ment, as well as in studying the impact of power certifications are highly job-related, given that forces and power games on measurement decisions this type of education can facilitate the measure- at different levels of analyses. ment and the managerial decisions (e.g. in a selection process). Thus, reciprocal influences between organisations and their contexts can be CONCLUSIONS studied in the future. A contingency approach can be proposed where the psychological assess- A multilevel psychological assessment has impor- ment depends on the characteristics of external tant potential benefits. Using this perspective, the contexts and the nature of the relations between great complexity of organizations is diagnosed, these contexts and organizations. given that the organization is considered as an open social system with different measurable The Political Face of Psychological subsystems. Psychologists can focus their psycho- Assessment in Organizations logical assessment at different levels of analyses. Thus, this perspective serves to consider both the Research and practice in organizations espouses micro domain’s focus on individuals and groups a rational perspective in understanding psycholo- and the macro domain’s focus on the organiza- gical assessment. Organizations are often defined tion as a whole. as rational and efficiency-seeking systems, and Additionally, the multilevel psychological assess- managers use psychological assessment in order to ment is enriched if three complementary perspec- achieve valued organizational outcomes. However, tives are also incorporated in the future. First, a their political ‘face’ should also be considered. more integrated assessment can be considered, Following this perspective, the organization is seen assuming that constructs measured at different as a political system with competing groups and levels of analyses can show complex, even contra- interests, each with its own perceptions of dictory, relationships. Secondly, there is a need to

[8.8.2002–12:29pm] [1–128] [Page No. 82] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Psychophysiology 83

study the reciprocal influences, in terms of Assessment (pp. 1–34). Chichester: John Wiley & psychological assessment, between organizations Sons. Morishima, M. (1995). Embedding HRM in a social and their external contexts. Finally, the political context. British Journal of Industrial Relations, 33, face of organizations should be measured and 617–640. analysed in order to obtain a richer portrait of Nord, W.R. & Fox, S. (1996). The individual in psychological assessment in organizations. organisational studies: the great disappearing act?. In Clegg, S.R., Cynthia, C. & Nord, W.R. (Eds.), Handbook of Organisation Studies (pp. 148–174). References London: Sage Publications. O’Reilly, C.A. (1991). Organisational behaviour: where we’ve been, where we’re going. Annual Review of Aaker, D.A. & Joacobson, R. (1994). The financial Psychology, 42, 427–458. information content of perceived quality. Journal of Peiro´ , J.M. (1985). Psychological Assessment in Marketing Research, 31, 191–201. Organisations. Evaluacio´ n Psicolo´ gica-Psychological Arvey, R.D., & Murphy, K.R. (1998). Performance Assessment, 1, 189–239. evaluation in work settings. Annual Review of Peiro´ , J.M. & Prieto, F. (Eds.), (1996). Tratado de Psychology, 49, 141–168. Psicologı´a del Trabajo (2 vols.). Madrid: Sı´ntesis. Dachler, H.P. (1989). Selection and the Organisational Rousseau, D.M. & Tinsley, C. (1997). Human context. In Herriot, P. (Ed.), Handbook of assess- resources are local: society and social contracts in ment in Organisations (pp. 45–69). Chichester: John a global economy. In Anderson, N. & Herriot, P. Wiley & Sons. (Eds.), International Handbook of Selection and Fornell, C. (1992). A national customer satisfaction Assessment (pp. 39–61). Chichester: John Wiley & barometer: the Swedish experience. Journal of Sons. Marketing, 56, 6–21. Schneider, B. (1985). Organisational Behaviour. An- Friedlander, F. (1987). The ecology of work groups. In nual Review of Psychology, 36, 573–611. Lorsch, J.W. (Ed.), Handbook of Organisational West, M.A. & Allen, N.J. (1997). Selecting for Behaviour (pp. 301–314). Englewood Cliffs: Pre- teamwork. In Anderson, N., & Herriot, P. (Eds.), ntice-Hall, Inc. International Handbook of Selection and Assess- Gatewood, R.D., Perloff, R. & Perloff, E. (2000). Testing ment (pp. 492–506). Chichester: John Wiley & and industrial application. In Goldstein, G. & Hersen, Sons. M. (Eds.). Handbook of Psychological Assessment (pp. 505–525). Oxford: Elsevier Science Ltd. Guzzo, R.A. & Dickson, M.W. (1996). Teams in Jose´ Maria Peiro´ and Vicente Martı´nez-Tur Organisations: recent research on performance and effectiveness. Annual Review of Psychology, 47, 307–338. Hackman, J.R. (1987). The design of work teams. In Related Entries Lorsch, J.W. (Ed.), Handbook of Organisational Behaviour (pp. 315–342). Englewood Cliffs: Pre- Culture (Organizational), Leadership in Orga- ntice-Hall, Inc. nizational Settings, Observational Techniques Herriot, P. & Anderson, N. (1997). Selecting for in Work and Organizational Settings, Risk change: How will personnel and selection & Prevention in Work & Organizational Set- psychology survive? In Anderson, N. & Herriot, P. tings, Self Reports in Work and Organizational (Eds.). International Handbook of Selection and Settings

APPLIED FIELDS: A PSYCHOPHYSIOLOGY

INTRODUCTION for their application, the psychological constructs and processes to be assessed, the methods The major focus of this entry will be to provide a employed, and specific issues concerning applied clear rationale for the application of psychophy- uses of these techniques. Specific guidance on siological approaches and methods to areas of psychophysiological recording has been dealt applied psychology. We will examine the reasons with elsewhere, together with entries on brain

[8.8.2002–12:29pm] [1–128] [Page No. 83] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 84 Applied Fields: Psychophysiology

activity and ambulatory monitoring. For back- much to do with the measures employed and ground reading and a general reference source, their various applications. Recently, Cacioppo Cacioppo, Tassinary & Bernstron’s Handbook of et al. (2000) described this as systemic psycho- Psychophysiology, 2nd Edition (2000) is recom- physiology, which refers to the study of the mended. Other useful introductory texts include various physiological systems (i.e. electrodermal, Caccioppo and Tassinary (1990), Hugdahl cardiovascular, cortical etc.) with respect to (1995) and Stern, Ray and Davis (1980). measurement, quantification and their relation- ships to psychological processes and paradigms. Much psychophysiological research has been DEFINITIONS AND CONSTRUCTS methodologically focused in validating either specific physiological measures or their use as Psychophysiology can be loosely defined as the indices of psychological constructs. Subsequently, study of psychological constructs and processes these measures have then been applied to using non-invasive physiological measures (see theoretical questions derived from other branches Cacioppo, Tassinary & Bernstron, 2000; Turpin, of psychology including both fundamental and 1989). Traditionally it is distinguished from applied research. Traditional areas of application physiological psychology by emphasizing the have included psychopathology research and the importance of studying the intact and conscious search for physiological markers of psychological organism, usually in the absence of invasive disorder, as well as the development of clinical techniques, which might disrupt and limit assessment and outcome measures (Keller, Hicks consciousness or behaviour. As such, the usual & Miller, 2000; Stoney & Lentino, 2000; domain of psychophysiology has been the Turpin, 1989). The measurement of stress and measurement of peripheral autonomic and central cognitive performance using psychophysiological cortical measures within human participants parameters has also meant that these techniques studied whilst engaged in psychologically relevant have been used extensively within human factors tasks or natural situations. In contrast, physiolo- and ergonomic research (Kraemer, & Weber, gical psychology has tended to use animal 2000). Other applied areas where psychophysio- subjects and to measure invasively, usually logical approaches have been adopted have directly from the nervous system, using implanted included attitude measurement, applied develop- electrodes, and frequently employing invasive mental psychology, environmental and specific manipulations such as lesioning, infusion of polygraphy (i.e. lie detection) applications pharmacological agents, direct stimulation etc. (Cacioppo et al., 2000). More recently, these boundaries have become less What are the benefits of using psychophysio- distinct since physiological psychology has been logical approaches? The answer lies in the range incorporated within the greater multidisciplinary of psychological constructs and paradigms for arena of neuroscience, and psychophysiology has which psychophysiological indices or measures been extended by more direct but still non- have been derived. Cacioppo et al. (2000) in invasive measures of brain activity and structure addition to describing ‘systemic psychophysiol- such as functional imaging, dense array electro- ogy’, also identified ‘thematic psychophysiology’ encephalography and magnetography (see which describes topical areas of psychophysiolo- Cacioppo et al., 2000). Nevertheless the cardinal gical research. They cited the following examples: features of psychophysiology as being the study cognitive psychophysiology (human information of psychological processes, largely from human processing and physiological events); social participants and using non-invasive physiological psychophysiology (reciprocal relationships measures, are central to the successful application between social systems and physiology); develop- of the discipline to more applied areas of study. mental psychophysiology (developmental and ageing processes); clinical psychophysiology (study of disorders); environmental psychophy- APPLIED PSYCHOPHYSIOLOGY siology (person–space interactions) and applied psychophysiology (psychophysiological technolo- Psychophysiology has always been essentially an gies such as biofeedback, lie detection, man– applied discipline since its identity has been very machine instruction etc.). These topics are

[8.8.2002–12:29pm] [1–128] [Page No. 84] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Psychophysiology 85

exhaustively covered within their handbook. behavioural, cognitive and physiological Similarly, we can identify at a more detailed approaches is a prime example of this argument. level a myriad of psychological processes and Moreover, there may be situations where constructs (e.g. attention, attitudes, emotion, systematic biases might be introduced with memory consolidation) for which there are respect to self-report (i.e. forensic settings) claimed to be psychophysiological indices or where the assessment of ‘truth and honesty’ (i.e. correlates (see Hugdahl, 1995). For example, a lie detection) or the presentation of certain class of evoked potential measures of brain disorders (e.g. Post-Traumatic Stress Disorder) activity called the ‘P300’ is said to be associated are claimed to be more accurately assessed using with a variety of psychological processes psychophysiological techniques. This raises the surrounding stimulus evaluation, categorization interesting question as to how objective psycho- and context updating (Donchin & Coles, 1988). physiological indices truly are and whether they Similarly, evoked potential Mis-Match-Negativity themselves can be subject to conscious manipula- (Na¨a¨ta¨nen, 1992), cardiac deceleration (Graham, tion and bias (Iocano, 2000). 1979) and the electrodermal response (Siddle, Doubts concerning objectivity are not the only 1983) have all been associated with the detection disadvantages to be considered when adopting of mismatches due to changes in stimulus novelty psychophysiological techniques. Whether claimed or significance. psychophysiological indices of putative psycholo- It is apparent that psychophysiological corre- gical constructs are either reliable or valid may lates exist for a wide range of psychological also be subject to challenge. With respect to constructs. The question, therefore, arises as to reliability, psychophysiological measures might be what advantages psychophysiological assessments heavily influenced by the setting and situation in present with respect to performance or self-report which they are obtained. This may give rise to measures? It is claimed that psychophysiological problems of generalizability, if care is not taken measures have the following advantages: they are to carefully standardize methods, settings, para- objective and free of either subjective or observer digms and materials. Reported test–retest reli- bias, they are continuous and unobtrusive abilities vary considerably across different measures, they can accurately indicate the psychophysiological indices (Strube, 2000). timing of psychological events, and they may Similarly, due to the practical constraints of indicate the nature of mechanisms underlying the assessing large numbers of individuals, standard- brain–behaviour relationships under study. ised norms for psychophysiological measures are Within an applied setting, many of these few and far between. This provides very definite advantages become even more important. The psychometric limits to the application of psycho- ability to obtain objective and continuous physiology to the single case. measures which do not require either self-report Specific psychophysiological theories are also or observation means that physiological measures limited and measures are usually interpreted indicating psychological changes in either state or within the context of other theoretical frame- processes may be studied in difficult or inacces- works from cognitive psychology and elsewhere. sible environments. These could range from space Sometimes this results in psychophysiological flight to studying arousal processes in married measures having particular interpretations, couples during naturalistic social interaction which are assumed rather than empirically (Gottman & Levenson, 1992). The emphasis on based. An example being whether cardiac objective versus subjective report also means that deceleration, a common psychophysiological data maybe obtained from individuals with response, should be interpreted as an index of communication difficulties either due to cognitive the orienting response, the detection of stimulus impairment or age and temperament. Indeed, novelty or merely just stimulus registration with respect to many psychological processes, it (Ohman, Hamm & Hugdahl, 2000). Similarly, is argued that a comprehensive understanding is psychophysiological constructs can persist even not possible without recourse to physiological though their empirical basis may be either measurement. Lang’s classical work (Lang, 1968; insubstantial or even contradictory. Perhaps the Turpin, 1991) on the measurement of anxiety best example and one, which is commonly used and the three systems approach which utilized within applied settings, is the notion of arousal.

[8.8.2002–12:29pm] [1–128] [Page No. 85] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 86 Applied Fields: Psychophysiology

Arousal is still used as a major explanatory potentiated startle be used to discriminate concept in many applied social and clinical set- between different diagnostic groups of anxiety tings despite much psychophysiological research, disorders, could they accurately track response to which has been deeply critical of the construct treatment and indicate therapeutic outcomes and (Gardener, Gabriel & Diekman, 2000; Turpin & gains? Unfortunately, there are in reality few Heap, 1998). This can lead to major problems areas of psychophysiology, which are used regarding interpretation and construct validity. routinely in professional psychology practice. Finally there are issues to do practical utility. Perhaps the only real examples are biofeedback Psychophysiological measures usually require treatments and polygraphy. Nevertheless, major complex electronic machinery for physiological areas of psychophysiological endeavour such as measurement, sophisticated computer software evoked potential research influence practical for data acquisition and analysis, laboratory applications in other areas such as clinical environments and trained technicians. These neurology or audiometry. resources are expensive and may not be widely Common clinical research applications of available. Furthermore, the reliance on laboratory psychophysiological measures have been as settings may also preclude many applied settings. measures of attention within schizophrenia: Consequently, many recent applications these have included electrodermal measures of have relied on the development of ambulatory orienting, P300 type event-related potentials (EP) methods. and early sensory gating EPs (Miller et al., 2000). Recent applications of dense array EEG to look Applied Constructs and Uses at lateral distribution of brain activity, especially over prefrontal cortex and its relationship to As discussed above, there are a wide range of affective processing and depression (Davidson, potential applications for psychophysiological 1992). Anxiety disorders research has focused on measures and approaches. Within the space the potentiated startle paradigm (Lang et al., limitations of this entry it is impossible to present 1990), as described above, together with studies even an overview of different types of applica- of autonomic balance within Generalized Anxiety tions. However, we will describe some recent Disorders (Thayer & Lane, 2000). Therapeutic examples. Before doing so, a distinction perhaps applications of psychophysiology continue in the needs to be made between applied research and form of studies of relaxation and meditation research in applied settings. Much psychophysio- (Turpin & Heap, 1998) and biofeedback logical research is geared to applied questions (Schwartz, 1995). relating to psychological understanding of impor- Psychophysiological studies within the disci- tant issues such as health and disease. However, pline of health psychology continue to examine this tends to be laboratory-based experimental mechanisms underlying cardiovascular disease research and is directed at using psychophysio- (Stoney & Lentino, 2000). Studies aimed at logical measures to seek answers to fundamen- assessing cardiovascular reactivity to psychologi- tally theoretically relevant questions but with cally challenging events continue to be per- consequences for applied areas. For example, formed (e.g. Fredrickson & Matthews, 1990). there has been an impressive growth in studies A particular focus is the relationship between employing the potentiated startle paradigm as a laboratory-based studies and ambulatory-mon- method of assessing emotional valence, and itoring based studies of reactivity. Psychophysio- anxiety in particular (Lang, Bradley & logical measures have been particularly adopted Cuthbert, 1990). At a theoretical level, this to assess the role of stress in contributing to the research has increased understanding of how aetiology and maintenance of common physical fear cues are processed at both conscious and conditions. In addition to the usual autonomic pre-attentive levels, and the possible neural measures such as heart rate and blood pressure substrates underlying some of these mechanisms reactivity, many studies examining ‘stress’ (Lang, Davis & Ohman, 2000). The question exploit techniques from psycho-immunology arises, therefore, whether these techniques can be and endocrinology: using biochemical assays of transferred into an applied setting and used for immune or hormonal status (Uchino, Kiecolt- more practical purposes? Could measures of Glaser & Glaser, 2000).

[8.8.2002–12:29pm] [1–128] [Page No. 86] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Psychophysiology 87

Human factors psychophysiology has tradi- Tassinary, L.S. & Berntson, G.G. (Eds.), Handbook tionally examined problems such as assessing of Psychophysiology. (pp. 643–664). Cambridge: Cambridge University Press. alertness and sleep quality, mental workload and Gottman, J.M. & Levenson, R.W. (1992). Marital performance, and man–machine interactions. A processes predictive of later dissolution: behaviour, full range of measures have been employed physiology, and health. Journal of Personality and including endocrinological assays (Lovallo & Social Psychology, 63, 221–233. Thomas, 2000) to evoked potential applications Graham, F.K. (1979). Distinguishing among orienting, defense and startle reflexes. In Kimmel, H.D., Van to man–machine interactions. Spectral analysis of Olst, E.H. & Orlebeke, J.F. (Eds.), The Orienting physiological parameters over extended periods Reflex in Humans. (pp. 137–167). Hillsdale NJ: of time or different activities is a technique Erlbaum. frequently employed in ergonomic applications. Hugdahl, K. (1995). Psychophysiology: The Mind- Mulder (1992), in particular, has exploited Body Perspective. Cambridge: Harvard University Press. measures of heart rate variability to assess Iocano, W.G. (2000). The detection of deception. In attentional and workload factors. Cacioppo, J.T., Tassinary, L.S. & Berntson, G.G. (Eds.), Handbook of Psychophysiology. (pp. 772–793). Cambridge: Cambridge University Press. Keller, J., Hicks, B.D. & Miller, G.A. (2000). CONCLUSION Psychophysiology in the study of psychopathology. In Cacioppo, J.T., Tassinary, L.S. & Berntson, Psychophysiology has a long tradition as being G.G. (Eds.), Handbook of Psychophysiology (pp. 719–50). Cambridge: Cambridge University used within applied settings. Advances in Press. technology have broadened the range of settings Kramer, A.F. & Weber, T. (2000). Applications of in which psychophysiological measures can be psychophysiology to human factors. In Cacioppo, obtained. Developments in neuro-imaging (e.g. J.T., Tassinary, L.S. & Berntson, G.G. (Eds.), Reiman, Lane, Van Petten & Bandettini, 2000) Handbook of Psychophysiology. (pp. 794–813). Cambridge: Cambridge University Press. also mean that psychophysiological techniques Lang, P.J. (1968). Fear reduction & fear behaviour: can now address exciting questions of functional problems in treating a construct. In Shlien, J.M. brain–behaviour relationships. Hopefully, these (Ed.), Research in Psychotherapy. Washington, DC: techniques will be extended so as to include American Psychological Association. more applied questions and applications. Lang, P.J., Bradley, M.M. & Cuthbert, B.N. (1990). Emotion, attention, and the startle reflex. Psycholo- gical Review, 97, 377–98. Lang, P.J., Davis, M. & Ohman, A. (2000). Fear and References. anxiety: animal models and human cognitive psychophysiology. Journal of Affective Disorders, Cacioppo, J.T. & Tassinary, L.G. (Eds.) (1990). 61, 137–59. Principles of Psychophysiology: Physical, Social, Lovallo, W.R. & Thomas, T.L. (2000). Stress hormones and Inferential Elements. Cambridge: Cambridge in psychophysiological research. In Cacioppo, J.T., University Press. Tassinary, L.S. & Berntson, G.G. (Eds.), Handbook of Cacioppo, J.T., Tassinary, L.G., & Berntson, G.G. Psychophysiology (pp. 342–367). Cambridge: (Eds.) (2000). Handbook of Psychophysiology. Cambridge University Press. Cambridge: Cambridge University Press. Mulder, L.J.M. (1992). Measurement and analysis of Coles, M.G.H., Donchin, E. & Porges, S.W. (Eds.) heart rate and respiration for use in applied (1986). Psychophysiology: Systems, Processes and environments. Biological Psychology, 34, 205–236. Applications. New York: Guilford. Na¨a¨ta¨nen, R. (1992). Attention and Brain Function. Davidson, R.J. (1992). Anterior cerebral asymmetry Hillsdale, NJ: Erlbaum. and the nature of emotion. Brain and Cognition, 20, Ohman, A., Hamm, A. & Hugdahl, K. (2000). 125–151. Cognition and the Autonomic Nervous System. In Donchin, E. & Coles, M.G.H. (1988). Is the P300 Cacioppo, J.T., Tassinary, L.S. & Berntson, G.G. component a manifestation of context updating? (Eds.), Handbook of Psychophysiology (pp. 533–75). Behavioural and Brain Sciences, 11, 354–356. Cambridge: Cambridge University Press. Fredrickson, M. & Matthews, K.A. (1990). Cardio- Reiman, E.R., Lane, R.D., Van Petten. C. & vascular responses to behavioural stress and Bandettini, P.A. (2000). Positron emission tomogra- hypertension: a meta-analytic review. Annals of phy and functional magnetic resonance imaging. In Behavioural Medicine, 12, 30–359. Cacioppo, J.T. Tassinary, L.S. & Berntson, G.G. Gardener, W.L., Gabriel, S. & Diekman, A.B. (2000). (Eds.), Handbook of Psychophysiology (pp. 85–114). Interpersonal processes. In Cacioppo, J.T., Cambridge: Cambridge University Press.

[8.8.2002–12:29pm] [1–128] [Page No. 87] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 88 Applied Fields: Work and Industry

Schwartz, M.H. (1995). Biofeedback – A Practitioner’s Turpin, G. (1991). The psychophysiological Guide (2nd ed.). New York: Guilford Press. assessment of anxiety disorders: three-systems Siddle, D.A.T. (1983). Orienting and Habituation: measurement and beyond. Psychological Assess- Perspectives in Human Research. Chichester, UK: ment, 3, 366–75. Wiley. Turpin, G. & Heap, M. (1998). Arousal reduction Stern, R.M., Ray, W.J. & Davis, C.M. (1980). methods: relaxation, biofeedback, meditation and Psychophysiological Recording. New York: Oxford hypnosis. In Hersen, M & Bellack, A. (Eds.), University Press. Comprehensive Handbook of Clinical Psychology. Stoney, C.M. & Lentino, L.M. (2000). Psychophysiolo- Adults: Clinical Formulation and Treatment, Vol. 6. gical applications in clinical health psychology. (pp. 203–227) London: Elsevier. In Cacioppo, J.T., Tassinary, L.S. & Berntson, Uchino, B.N., Kiecolt-Glaser, J.K. & Glaser, R. (2000). G.G. (Eds.), Handbook of Psychophysiology. Psychophysiological modulation of cellular immu- (pp. 751–771). Cambridge: Cambridge University nity. In Cacioppo, J.T., Tassinary, L.S. & Berntson, Press. G.G. (Eds.), Handbook of Psychophysiology Strube, M.J. (2000). Psychometrics. In Cacioppo, J.T. (pp. 397–424). Cambridge: Cambridge University Tassinary, L.S. & Berntson, G.G. (Eds.), Handbook Press. of Psychophysiology. (pp. 849–869). Cambridge: Cambridge University Press. Graham Turpin Thayer, J.F. & Lane, R.D. (2000). A model of neurovisceral integration in emotion regulation and dysregulation. Journal of Affective Disorders, 1, 000–000. Tassinary, L.G. & Cacioppo, J.T. (2000). The skeleto- Related Entries motor system: surface electromyography. In Caciop- po, J.T. Tassinary, L.S. & Berntson, G.G. (Eds.), Handbook of Psychophysiology (pp. 163–199). Ambulatory Assessment, Anxiety, Anxiety Dis- Cambridge: Cambridge University Press. orders, Psychophysiological Equipment & Turpin, G. (Ed.) (1989). Handbook of Clinical Measurements, Brain Activity Measurement, Psychophysiology. Chichester: Wiley. Equipment for Assessing Basic Processes

APPLIED FIELDS: WORK A AND INDUSTRY

INTRODUCTION organization in particular, is to take an Individual, Group, or Organizational perspective. Very broadly one might say, that wherever people are busy there is a chance and a need for psychological assessment. However, it is impossi- INDIVIDUAL PERSPECTIVE ble to name all fields in work and industry which are open for psychological assessment. The psyc- Starting with assessing the Individual within a hological assessor just has to look at the world of company or an organization one might question work and industry around him in order to find out ‘what, when, what for’: Of course, psychological what he might contribute. This may be done in assessment is of interest in order to learn more terms of theories and constructs which allow to about the individual’s strengths and weaknesses, evaluate work and industriousness. This may be about his attitudes and beliefs, and about his done by instruments which operationalize con- competencies and potentials. Here, methods like in structs and measures that are reliable and valid. mental tests, reaction time studies, occupational This may be done in terms of methods, designs, and personality scales (Ones & Viswesvaran, 2001), results to present to a customer or a team of motivation scales, and opinion questionnaires are experts. called for. The aim is to describe a person as fully as One approach to systematize assessment is needed to evaluate of how she or he will do (well) in applied fields in general, and of work and on a prospective job. Thus data at job entry are

[8.8.2002–12:29pm] [1–128] [Page No. 88] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Work and Industry 89

used to forecast the ‘zone of proximal development’ order to assign someone to a proper position for of an applicant. One has to recognize (see the sake of himself and the benefit of the Furnham, 2001) that an individual: organization. Placement decisions should be based on sound assessment data. . chooses a job based on pay, location, Even at the end of a career, assessment may help job security, and training based on his to find a new position outside the organization by personality traits, attitudes, and values. means of outplacement or early retirement plans. . adapts to a job out of necessity, insight, One might also have to look at the loss or motivation weakening of competencies and skills over time and . changes a job by altering the physical and find means and measures to decide about social environments. rehabilitation programs. Here, it is of interest . evolves with new technology, markets and what residual competencies are available, to which global requirements according to what he degree, and how they should be built upon in a understands are necessary requirements in rehabilitation training. the future. Seen as such, psychological assessment is a All this is open for assessment. But assessment of work life long companion activity which serves an individual does not stop at job entry. Any job the individual and the organization in order to confronts incumbents with a variety of minor and fruitfully monitor the interaction between both of major challenges. One of it is to learn to function them. The psychological well being of the well at a certain position. Thus learning gains or individual is a target as also the reasonable use developing several competencies are of interest to of his forces at work. Assessment emphasizes assessors. Assessment results lead to improve the prerequisites to job demands, trainings, and interaction with the individual and the work personality developments. However, it also place by considering human factors for improved emphasizes effects of all the aforementioned functioning, by motivating the individual, by after a new job was assigned, a training was designing up to date remuneration schedules, by accomplished, and a personal challenge was considering aptitude treatment interactions in taken. Assessment data are vital to human designing effective training programs, by mon- resource management and thus have to be valid, itoring communication and coordination with reliable, and objective in the first place to sustain others, by communication and coordination all personnel decisions that are taken. programs to name but a few. (Related entries of this volume: Achievement A new aspect for assessment emphasizes motivation ass., achievement testing, affect ass., licensing professionals as an aspect of overall biographical measures in work ass., burn our quality assurance in production and service. ass., Assessment centres, cognitive abilities or Companies may want coworkers who have mental aptitudes ass., creativity ass., emotional knowledge, skills, and competencies to deal intelligence ass., ass. of intelligence, interest ass., with their products within the company itself interview, locus of control ass., motivation ass., but, even more important, at all customer sites. observational methods, personality ass., ass. in The service person for a database product of a personnel selection, portfolio ass., practical regional bank may create quite a loss if a new intelligence ass. , problem solving ass., ass. of program release is not handled with care. This is reasoning, self efficacy ass., self reports, tempera- part of the liability movement in modern societies ment ass., ass. of personal constructs, personality which assures that products and services do not ass., vocational ass.) do any harm to others. Here, with each new product and each new service there has to be a model of proper use and a contingent assessment GROUP PERSPECTIVE of its components. So assessment takes place in regard to accreditation and licensing. Assessment at the group level is mainly oriented During a person’s professional life there are towards productivity, conflict resolution, good numerous occasions to assess what an incum- communication, and coordination. One may bent’s profile of competencies is like, or to find want to look at the social functioning of a team out about the set of strengths and weaknesses in by means of a sociogram (Moreno, 1951; see

[8.8.2002–12:29pm] [1–128] [Page No. 89] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 90 Applied Fields: Work and Industry

Sociometric Methods this volume), by means of ORGANIZATIONAL PERSPECTIVE interaction analysis (Bales, 1950, SYMLOG), by means of a questionnaire about role ambiguity Organizational assessment is by far more macro- (Rizzo et al. 1970), or by observation studies in scopic than the foregoing two approaches (see an obtrusive or non-obtrusive manner (Putnam Harrison and Shirom (1998). First, it has to be and Jones, 1982). defined, what is the organization under scrutiny. Some assessments are status oriented and Some of them are small shops in a small region should allow to judge what are the prevailing and others are global players operating on quite attitudes or obstacles in group life in order to go diverse markets. Second, the perspective may from here to improve it. Actions may involve change if one considers an organization from changes in group memberships, group trainings, within, its inner dynamics, its members, in or re-groupings at large. contrast to considering clients, suppliers, and More of a process-oriented approach is called for organization members at the same time. if monitoring of actions is of interest. Longitudinal In order to assess, i.e. describe, an organization’s assessment data are needed to describe what climate, for example, quite different actions have to changes take place in a group and explain why be taken. One has to look at what attracts people to these changes occur. Cross sectional data reveal an organization, what keeps them within, and what how different groups develop independently from are the typical characteristics of those who are there each other. Harrison and Shirom (1998: 161) for a given time (Schneider, 1987). So even for present some key group factors: (1) Group personnel marketing and in recruitment campaigns Composition, Structure and Technology like one may want to use self-assessment instruments nationality mix, divergence of professions, decision (see Self Assessment in this volume). Pritchard and procedures, control procedures like evaluation, Karasick (1973) provide a scale with eleven comprehensiveness of controls, and (2) Group dimensions like Autonomy, Conflict vs. Behaviour, Processes, and Culture like relations Cooperation, Social Relations, Structure to name among members, reward types, direction of but a few. Based on this and other research, James information flows, openness, decision making, and James (1989) provided a model which supervisory behaviour (supportiveness, participa- emphasizes (1) role stress and lack of harmony, tion, goal setting, performance expectation, con- (2) challenge at work and autonomy, (3) facilita- flict management). tion by leadership and support, (4) cooperation, Topics may range from modern shift systems, friendliness, and warmth in a team. As Weinert remuneration schedules, new production techni- (1998) points out these factors are related to roles, ques, new forms of cooperation and coordina- leadership, and teams. tion, integrating minorities, client centredness of Organizational culture (Schein, 1985; see work, quality assurance at each production step, Assessment of Organizational Culture this self-organization of the team, group cohesion, volume), an adjacent construct, emphasizes role conflicts/clarity, mobbing propensity, co- common shared values, norms, goals, beliefs, and worker–supervisor relationship, learning needs. perspectives. Thus here the scope is on meaning, This list is by far not complete but it displays intentions, purpose of work and tasks, as well as on minor and major topics which may be subject to methods to achieve organizational essentials and an in-depth assessment. Practical problems are underlying norms and values in all what members closely linked to some kind of sometimes political do. Artifacts and behaviour patterns are by far action on behalf of the management and the more visible than beliefs, cognitions, and basic labour union representatives. assumptions within a company. Sackman (1992) (Related entries of this volume: applied referred to cognitive orientations as part of behavioural analysis, attitudes ass., coping style organizational culture and identified four forms: ass., ass. of couples, dynamic ass., goal attain- ment ass., ass. of groups, irrational beliefs ass, . dictionary knowledge – definitions of labels leadership ass, learning strategies ass., ass. of and definitions minorities, personal constructs ass., observational . directory knowledge – assumptions of how methods, self reports, time orientation ass., ass. common practices work and what are of wisdom, work performance ass.) presumably causal relations

[8.8.2002–12:29pm] [1–128] [Page No. 90] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Applied Fields: Work and Industry 91

. recipe knowledge – prescriptions for im- problem and thus do not answer the question proving remedy processes or urgencies raised fully. In the case of assessment of an . axiomatic knowledge – about nature of individual’s behaviour there are numerous instru- things and why events occur. ments from Differential Psychology. But if one addresses group and organizational problems one The reasons for an assessment are manifold, too. finds less and less formalized and standardized There may be a constant interest in changes of instruments. One help may be ‘instruments’ the organization, a need to assess the organiza- shared among psychological assessors who tion prior or as a consequence of a re- worked out a scale, evaluated it at one site or organization, an in-depth view of what merging within one company, but made it available to with another company had as effect, to assure others. At least some kind of documentation that new products and production techniques are about intended scope, design, and small scale adopted by the workforce, to find out how new results and evaluations are available (Drasgow markets would affect the members, and what and Schmitt, 2001). challenges are perceived in the light of new So ad hoc instruments are created by internal clients. The scope is always to find out something staff or outside consultants. Often a sound about the organization as a whole. Most of this explication and elaboration of constructs is will be assessed by means of questionnaires, but missing. Some instruments lack a theory-based some is discernible by interviews, observation, or pre-evaluation of questions to be asked. This is unobtrusively browsing through documents, self- sacrificed to immediate results because market reports, marketing material, and guidelines. More forces drive the management to deciding. One may qualitative than quantitative results are likely definitely wish that even a ‘simple’ questionnaire is with the latter. considered and valued as a measuring instrument in An investigation may be launched at the itself. It provides sound data only if it had been beginning of a change in organizational beha- designed and developed according to goals, viour or the end of a campaign. In particular, established theory, constructs, and empirical many questionnaire-based actions are meant to results. In ‘rapid practice’ questions are ambiguous shed light on aspects the management wants to and so are results. Often the questionnaire falls emphasize. So the questions are one means to short of a sound coverage of facets and so data are convey to the workforce what is considered incomplete or highly one-sided. essential to the organization. The questions There are, of course, good guidelines altogether convey a message as such, and (Fleishman and Quaintance, 1984) as to how to subsequent results tell everyone the degrees to construct a good measure. Many instruments which essentials are shared. If, for example, there ought be based on sound job descriptions to pre- are several questions about cooperation formats define relevant target behaviours, task-related then the responders are geared to particularly competencies, and job-related social skills (see perceive this construct and evaluate his momen- Job Characteristics Assessment in this volume). tary reflections on this. Thus the questionnaire is Also (item and/or person) sampling techniques highlighting a concept which may be on the (Shoemaker, 1973) allow to save costs at the organization’s agenda. expense of not asking everyone that should be Scaffoldings of how to organize an assessment invested in instrument design and evaluation. are given by the Open-Systems Analysis As was mentioned above, apart from question- (Harrison & Shirom, 1998), Six Box Model naires there may be used interviews, observations, (Weisbord, 1976), Stream Analysis (Porras, 1987) survey-feedback approaches, simulations, grid- just to name a few. techniques (Jenkins, 1998), and scenario techni- ques for example. However, the less standardized they are, the more the assessment errors that may ASSESSMENT INSTRUMENTS be committed. In general, any instrument should be closely designed for that purpose it has to serve. There are published instruments which allow Ad hoc instruments should be avoided, but even standardized interpretation. But their draw- instruments with some empirical underpinning back may be that they do not address the present should be preferred. The former only allow

[8.8.2002–12:29pm] [1–128] [Page No. 91] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 92 Applied Fields: Work and Industry

an assessment per fiat and the latter an assessment which help to improve an individual’s, a group’s, per fact. and an organization’s life. The latter is for the benefit for all of them.

ASSESSMENT DESIGNS References Designs of how to conduct an evaluation study Bales, R.F. (1950). A set of categories for the analyses (Cook & Campbell, 1979; Sanders, 1994) are of small group interaction. American Sociological available since long. But in regard to sound Review, 15, 146–159. assessment of effects of introduced changes at the Campbell, D.T. & Stanley, J.C. (1963). Experimental person, group, or organizational level there ought and quasi-experimental designs for research on to be more than one measure of an effect, and teaching. In Gage, N.L. (Ed.), Handbook of research on teaching. Chicago, IL: Rand McNally. even a pre-measure should be available as a Cook, T.D. & Campbell, D.T. (1979). Quasi-Experi- standard against which one may judge any mentation-Design & Analysis Issues for Field changes. Restructuring of an organizational unit Settings. Chicago, IL: Rand McNally. is quite an investment, and it is desirable to Drasgow, F. & Schmitt, N. (Eds.) (2001). Measuring and Analyzing Behaviours in Organizations. San trace back to a prior measure what and how Francisco: Jossey Bass Publishers. much has changed. Often enough an effect is Fleishman, E.A. & Quaintance, M.K. (1984). Taxo- ascertained but vanishes over time. So more than nomies of Human Performance: The Description of one post measure is advocated. Designs may be Human Tasks. Orlando, FL: Academic. borrowed from Educational Psychology Furnham, A. (2001). Personality and individual differ- ences in the workplace-Person-Organization-Fit. In (Campbell & Stanley, 1963) to assure that Roberts, B.W. & Hogan, R. (Eds.), Personality in assessed changes are true changes and not just the Workplace. (pp. 223–252). Washington DC: valid for a short time. American Psychological Association. Not only is it possible to sample individuals, but Harrison, M.I. & Shirom, A. (1998). Organizational content areas can be sampled as well (Shoemaker, Diagnosis and Assessment: Bridging Theory and Practice. Thousand Oaks, CA: Sage Publications. 1973; Hornke, 1978) in order to have a sound Hornke, L.F. (1978). Personen-Aufgaben-Stichproben. picture. It is not necessary to ask everyone the same In: Klauer, K.J. (Ed.), Handbuch der pa¨dagogischen questions, and have many duplicated answer Diagnostik, Band 1. Du¨ sseldorf:Schwann. pattern. Good design of individuals and content James, L.A. & James, L.R. (1989). Integrating work samples yield sufficient reliable and valid data and environment perceptions. Explorations into the measurement of meaning. Journal of Applied will help to save costs quite a bit. It just demands a Psychology, 74, 739–751. bit of prior construct knowledge, some speculation Jenkins, M. (1998). The Theory and Practice of about possible effects, and a kind of intelligent Comparing Causal Maps. London: Sage. logistic in regard to data collection. Not any all- Moreno, J.L. (1951). Sociometry, Experimental Meth- od and the Science of Society. New York: Beacon embracing survey is worth its efforts and invest- House. ments. Sometimes a sound less is much more! Ones, D.S. & Viswesvaran, C. (2001). Personality at work: criterion-focused occupational personality scales used in personnel selection. In: Roberts, FUTURE PERSPECTIVES AND B.W. & Hogan, R. (2001). Personality in the CONCLUSIONS Workplace. Washington, (pp. 63–92). DC: American Psychological Association. Porras, J.I. (1987). Stream Analysis. Reading, MA: The initial and implicit question, of what the Addison-Wesley. fields of psychological assessment are in regard to Pritchard, R.D. & Karasick, B. (1973). The effects of work and organization, can only be answered at organizational climate on managerial job perfor- mance and job satisfaction. Organizational Beha- a surface level. It is left to the ingenious assessor viour and Human Performance, 9, 110–119. and his efforts, interests, and creativity to sense Putnam, L.L. & Jones, T. (1982). Reciprocity in what the fields of assessment activities are. No negotiations: an analysis of bargaining interaction. one assigns them to him and even a contract Communication Monographs, 49, 171–191. allows for sound science-based assessments the Rizzo, J.R., House, R.J. & Lirtzman, S.I. (1970). Role conflict and ambiguity in complex organisa- contractor himself might not have had in mind. tions. Administrative Science Quarterly, 15, Applied fields in this sense are all those fields 150–153.

[8.8.2002–12:29pm] [1–128] [Page No. 92] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Assessment Process 93

Roberts, B.W. & Hogan, R. (2001). Personality in the theory. Group & Organizational Studies, 1, Workplace. Washington, DC: American Psychologi- 430–447. cal Association. Sackman, S. (1992). Cultures and subcultures: an Lutz F. Hornke analysis of organizational knowledge. Administrative Science Quarterly, 37, 140–161. Sanders, J.R. (1994), (Ed.). The Program Evaluation Standards (2nd ed.). Thousand Oaks, N.J.: Sage Publications Related Entries Schein, E.H. (1985). Organizational Culture and Leadership. San Francisco, CA: Jossey-Bass. Cognitive/Mental Abilities in Work & Organi- Schneider, B. (1987). The people make the place. zational Settings, Interview in Work & Organi- Personnel Psychology, 28, 447–479. zational Settings, Motor Skills in Work Shoemaker, D.M. (1973). Principles and Procedures of Settings, Observational Techniques in Work Multiple Matrix Sampling. Cambridge, Mas: xxx and Organizational Setting, Performance in Weinert (1998). Organisationspsychologie. (4th ed.),. Work Settings, Physical Abilities in Work Weinheim: Psychologie Verlagsunion. Settings, Risk & Prevention in Work & Organi- Weisbord, M.R. (1976). Diagnosing your organiza- zational Settings, Self Reports in Work and tions: six places to look for trouble with or without Organizational Settings.

A ASSESSMENT PROCESS

INTRODUCTION with respect to the methodological standards that the professional gets his or her identity as an In solving daily life problems, we automatically academically educated expert in a particular field. execute a lot of judgement and decision making. Most methodological standards in the field of We also gather information or consult others in assessment published in the standards of the order to make well-informed decisions and professional organizations are related to the judgements. The assessment process in the field methods and procedures the psychologist uses in of psychology is about the gathering and collecting information. Standards or guidelines processing of information by a professional in with respect to the assessment process are not so order to get well-informed judgements and well articulated. Actually, it is only recently that decisions concerning a specific request made by the European Association of Psychological a person or an organization. The client is either a Assessment installed a Task Force to formulate person or an organization that made the request; Guidelines for the Assessment Process (GAP) the subject is the person or organization who is (Ferna´ndez-Ballesteros, 1998). the target of the assessment. Psychological This entry contains four sections. The first assessment refers to the judgements and deci- section highlights the distinction between assess- sions made by the professional psychologist. ment and testing. The second section analyses the Assessment process refers to how these judgement assessment process. The third section mentions and decisions came about and how these some of the biases that may disturb the intrinsic judgement and decisions are communicated to validity of the process and mentions some the client. remedies proposed in the literature. The fourth Contrary to the layperson, the professional has section points to developments in the field that the obligation to process his judgements and try to model the assessment process. The last decisions according to three sets of standards: section pays attention to the most recent ethical standards, social standards, and metho- contribution to the field, which is the production dological standards. Ethical and social standards of professional guidelines for the assessment apply to all fields of professional psychology. It is process.

[8.8.2002–12:29pm] [1–128] [Page No. 93] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 94 Assessment Process

ASSESSMENT AND TESTING context differs basically from that of the scientific process (De Groot, 1969; Sloves et al., 1979). The relatively late attention to the quality of the For the scientific researcher in psychology the assessment process might partly be explained by context of his or her work is the body of knowledge the dominant position of the psychometric at a particular domain and the researcher is focused approach in assessment. Psychometrics is the upon phenomena not yet explained within that discipline that deals with formal statistical founda- particular domain. The goal of the scientific tion of measuring and validating individual researcher is to find descriptions and explanations differences. In the field of applied psychometrics that generalize across persons and situations. It is this tradition focuses on two issues: the develop- not the concrete person who is the subject, but ment of psychometrically sound tests and the general phenomena such as perception, motivation, validation of these tests with respect to external or personality dimensions. The assessor, however, criteria. A test is psychometrically sound when it focuses on the person with his or her particular proves to be an objective, quantitative, and reliable problems in his or her past as well as present measure of individual differences. It is psychome- situation. The primary goal of the process is to trically valid when its scores predict the position of contribute to the solution of a person’s problems. the examinee on some other criterion or character- The more the person’s problems can be described istic. In order to be accepted as a test, the and understood as representative for problems instrument must be constructed and validated shared by other persons, the more the assessor can according to the prescriptions of the existing rely on a common body of knowledge and apply psychometric theories (Allen & Yen, 1979, and procedures and protocols developed for specific Nunnally, 1978, for further documentation). The groups of clients. However, in many cases the psychometric tradition has proven very valuable assessor cannot just apply already established and both test theory and testing are integrated in knowledge. Instead, he or she has to rely on his the academic education of assessors. Moreover, the or her methodological and professional experience tradition has witnessed distinguished scholars who in using the state of the art in the field to design a published fine books on testing and test use client-tailored procedure and to make an educated (Anastasi & Urbina,1997; Cronbach, 1990). interpretation of the outcome. Assessment is a summary term which refers to When talking about the client’s problem, it is all the activities the assessor, performs in important to make a distinction between adjust- producing an answer to the client’s request. ment problems and clinical problems. By problem These activities may include testing among other is meant a psychological state of uncertainty for activities, such as analysing the client’s problem, which neither the client nor his or her social generating hypotheses about its causes and network sees a preferred course of action. searching for the appropriate intervention. It is Adjustment problems are problems all people the analytical and constructive quality of the encounter in their daily life, and for which they assessment process that distinguishes assessment may want to seek professional advice. Examples from mere testing. of such problems are marital conflict, study choice, and career planning. Clinical problems are problems that have dysfunctional effects on THE PROCESS the psychological and social well being of the client. In assessing adjustment problems, the The assessor has to analyse the request and to assessor uses instruments and applies knowledge integrate his results in a case formulation, which that belongs to the domain of general psychol- takes into account the available knowledge in the ogy. In assessing clinical problems, the assessor field. In doing so he has to follow the same kind of uses tools and knowledge that pertain to the logic any scientific researcher follows in deduc- domain of clinical psychology. tively inferring hypothesis, in testing these hypoth- An important part of the assessment process is eses, and in formulating conclusions in the the explanation to the client of why and how the framework of the available knowledge. However, assessment tools are applied and how strong the although the assessment process follows the same evidence is, which may be the outcome of kind of logic as in any scientific search process the the process. The kind of assessment tools and

[8.8.2002–12:29pm] [1–128] [Page No. 94] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Assessment Process 95

knowledge involved are triggered by the requests aids is as weak a decision maker as the less- the assessor has to answer. The simplest format trained professional or lay-person. to describe such requests is that of a question as The studies which demonstrate the fallibility of if phrased by the client. Examples of such the clinical judgement and decision making questions within the non-clinical domain are: belong to three different research streams which ‘Am I suited for this type of job?’, ‘Which can be labelled as the psychometric, cognitive and qualities do I have to develop in order to be social-psychological tradition. The psychometric eligible for this particular education program?’, tradition offers evidence for the fact that clinical ‘What conditions at the workplace are respon- prediction is nearly always less accurate than a sible for getting the high rate of job turnover?’. prediction made by standardized formal predic- Examples of questions in the clinical domain are: tions. Meehl already drew this conclusion in ‘How serious are my anxieties?’, ‘Why does it 1954, and ever since he was supported by many happen to a person like me to be burn out’, ‘ Do other reviews, the most recent one was presented I need psychological treatment to master my by Grove et al. (2000). If one wants to predict a feelings of self-worthlessness?’. person’s state of mind or behaviour in the future, Concrete requests and related questions auto- the best thing to do is to base the prediction on matically specify the kind of assessment activities the outcomes of empirical studies of the relation- the assessor should perform in order to answer ship between predictor (present state) and the questions. For instance, in order to answer the criterion (future state). question ‘How serious are my anxieties?’, the The cognitive research tradition presents evi- assessor first has to describe the anxieties and, dence that cognitive heuristics which allow people secondly he or she has to evaluate the anxieties to operate rather well in their daily lives never- against some standard or norm of severity. In theless may have distorting effects in dealing with answering the question ‘What conditions are restricted and probabilistic information. Since the responsible for the labor turnover’, the assessor seminal work of Tversky and Kahneman (1974) the first has to check whether the turnover is distorting effect of cognitive heuristics have been unusually high (again evaluation against a demonstrated in all kinds of choice and decision standard). Secondly – when the latter is the case situations and with all kinds of people, profes- – he or she has to hypothesize about conditions sionals as well as laypersons (see Baron, 1994, and and, thirdly, to test these hypotheses by collecting Goldstein & Hogarth, 1987, for a review). Of data and evaluating the outcome. special critical interest for the assessment process Whatever the steps taken in the process, the are the heuristics people use in the generation and process ends in an advice to the client. The oral testing of hypotheses. One of the most famous and written report of the course and outcome of heuristics in this respect has been called the the assessment process must give the client a fair confirmatory test strategy. People have the strong and evidence-based account of the given advice. tendency to test hypotheses by searching the The assessor should be careful in conveying the information that confirms the hypothesis and to probabilistic and conditional nature of his or her neglect searching information that would discon- statements. firm the hypothesis. The social-psychological tradition presents evidence that in meeting the client the clinician FLAWS AND BIASES is inclined to select and interpret information from the perspective of his causal attributions, The assessment process contains many instances stereotypes and characteristics. Of specific inter- in which the assessor, alone or in dialogue with est to the assessment process is the actor– the client, determines the course of action. The observer bias hypothesized by Jones and Nisbett assessor should be aware of and protect him- or (1971) and empirically demonstrated in several herself against the flaws and biases of clinical studies (see Turk & Salovey (1988) for a review). judgement. Clinical judgement refers to informal In explaining their behaviour actors tend to and subjective thinking and decision making. attribute it to situational factors while observers There is ample evidence that the professional tend to attribute this behaviour to internal causes psychologist who is not armed by proper decision like traits and motives.

[8.8.2002–12:29pm] [1–128] [Page No. 95] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 96 Assessment Process

Studies of flaws and biases have automatically financial profits, but also depends upon more led to the question how these flaws and biases personal values such as social recognition, social could be avoided or at least restricted. Several identity, and emotional and intellectual satisfac- proposals have been made, ranging from further tion. The assessment process should result in advice standardization of data-collection and empirical in which probability of success is weighted by the validation of prediction procedures involved up value of that success. In the utility model the to debiasing reasoning techniques and computer- assessment process is formalized as the combina- ized decision aids. Lists of such proposals are tion of probability and values that apply to each of given in Garb (1998), Haynes and O’Brien the choice options (Baron, 1994; Von Winterfeld & (2000), and Turk and Salovey (1988). Edwards, 1986). Neither the statistical model nor the utility model are developed to model the full assessment MODELLING THE PROCESS process which starts with the client’s request and results in an advice to the client. Nor do these In many fields of professional psychology, one models formalize the specific decision rules the always has been well aware of the intricacies and diagnostician should follow in going through the fallacies of an assessment process that is not main phases of the process. Westmeyer (1975) protected somehow against the flaws and biases proposed an algorithmic model. In this formal of clinical judgement. Considerable progress has model decision algorithms are supposed to work been made in standardizing the way in which on an adequate empirical knowledge base which information can be gathered by using reliable and contains complete sets of conditional probabilities valid tests by which a client’s response can be for a specified type of both problem and client. compared with that of others. However, not only All three models presented so far are normative the data collection and statistical interpretation in the sense that they process information accord- should proceed properly, the same should apply ing to statistical or decision rules. Strict normative to the comprehensive assessment process, which models set formal conditions that usually cannot be starts with the client’s requests and ends in the met in psychological practice nor in the knowledge assessor’s advice to the client. base this practice is supposed to work with (see In non-clinical domains, such as job and Westmeyer & Hagebo¨ ck 1992, for a discussion). curriculum selection, the client’s requests relate Therefore, many students of the assessment process to the client’s strengths and weaknesses with have tried to model the process according to more respect to a certain job or study curriculum. Here heuristic principles that could guide the process. the relevant empirical body of knowledge is the Most of these heuristic models have been restricted relationship between the client’s characteristics to a diagrammatic presentation of the assessment and the success or satisfaction in the job or process (Maloney & Ward, 1976) while some curriculum at hand. What emanates from this others (De Bruyn, 1992; Haynes & O’Brien, empirical approach is – technically speaking – a 2000) have led to elaborations which show how multiple regression equation in which the scores the assessor can proceed if he wants to follow the on a standardized battery of tests are weighted logical decision flow depicted in the model. according to their relationship with the criterion, and combined in such way that the prediction of the criterion is as accurate as possible. The FUTURE DIRECTIONS: GUIDELINES assessment process is modelled after a statistical FOR THE ASSESSMENT PROCESS prediction model. The assessment process reaches a level of standardization that equals the level of Despite the growing interest in the quality of the standardization of each of its components. assessment process, a comprehensive set of Uncertainty about which job or study to engage heuristic guidelines that could support the assessor in most often presents a problem of choice. Not in executing the process is still lacking. This is in only the probability of success in each of the choice contrast to the related fields of testing (American options is at stake, but also the value each of these Psychological Association, 1999) and program options have for the client. The value of having evaluation (Joint Committee on Standards for success in a particular career is not restricted to Educational Evaluation, 1994) which eventually

[8.8.2002–12:29pm] [1–128] [Page No. 96] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Assessment Process 97

have succeeded in the formulation of standards that a proposal for discussion. European Journal of monitor professional work. It is only recently Psychological Assessment, 17(3), 178–191. Garb, Howard J. (1998). Studying the Clinician: (Ferna´ndez-Ballesteros, 1998) that a task force Judgment Research and Psychological Assessment. consisting of psychologists from different fields in Washington, D.C.: American Psychological psychology started to think of formulating guide- Association. lines to cover all phases of the assessment process. Grove, W.M., Zald, D.H., Lebow, B.S., Snitz, B.E. & The task force formulated a set of guidelines to Nelson, C. (2000) Clinical versus mechanical prediction: a meta-analysis. Psychological Assess- cover the phases of analysing the case, organizing ment, 12(1), 19–30. and reporting results, planning the intervention, Haynes, Stephen, N. & O’Brien, William (2000). and evaluation and follow-up. (Ferna´ndez- Principles and Practice of Behavioural Assessment. Ballesteros et al., 2001). Instead of being New York: Kluwer Academic. rigid rules, fixed forever, these guidelines represent Hogarth, Robin, M. (1987). Judgement and Choice: The Psychology of Decision (2nd ed.). Chichester: recommendations for professional behaviour. Wiley (1st ed.,1980). As already demonstrated in the fields of testing Joint Committee on Standards for Educational and evaluation, such guidelines highly contribute Evaluation (1994) The Program Evaluation Stan- to the development of the profession. Therefore, dards (2nd. ed.) Thousand Oaks, CA: Sage (1st ed., 1981). as stated by Ferna´ndez-Ballesteros et al.: ‘We Jones, Edward E. & Nisbett, Richard E. (1971). The hope that the efforts made in developing and actor and the observer: divergent perceptions of the disseminating these Guidelines stimulate the causes of behaviour. In Jones, E.E., Kanouse, D.H., discussion among interested scientific and profes- Kelley, H.H., Nisbett, R.E., Valins, S. & Weiner, B. sional audiences and, in the long run, will (Eds.), Attribution: Perceiving the Causes of Beha- viour (pp. 79–94). Morristown, NJ: General Learn- contribute to improve the practice of psycholo- ing Press. gical assessment as well as the education and Maloney, M.P. & Ward, M.P. (1976). Psychological training of psychological assessors.’ (2001: 185). Assessment: A Conceptual Approach. New York: Oxford University Press. Meehl, Paul E. (1954). Statistical versus Clinical References Prediction. Minneapolis, MN: University of Minne- sota Press. Nunnally, Jim C. (1978). Psychometric Theory (2nd Allen, Mary J. & Yen, Wendy, M. (1979). Introduc- ed.). New York: McGraw-Hill (1st ed., 1967). tion to Measurement Theory. Monterey, CA: Sloves, R.E., Doherty, E.M. & Schneider, K.C. (1979). Brooks/Cole. A scientific problem-solving model of psychological American Psychological Association (1999). Standards assessment. Professional Psychology, 1(1), 28–35. for Educational and Psychological Tests. Washing- Turk, D. & Salovey, P. (Ed.). (1988). Reasoning, ton, DC: American Psychological Association. Inference and Judgment in Clinical Psychology. New Anastasi, Anne & Urbina, Susana (1997). Psychologi- York: Free Press. cal Testing. Upper Saddle River, NJ: Prentice Hall. Tversky, A. & Kahneman, D. (1974) Judgment under Baron, Jonathan (1994). Thinking and Decision (2nd uncertainty: heuristics and biases. Science, 185(50), ed.). Cambridge: Cambridge University Press. (1st 1124–1131. ed., 1988). Von Winterfeld, Detlof & Edwards, Ward (1986). Cronbach, Lee.J. (1990). Essentials of Psychological Decision Analysis and Behavioural Research. Cam- Testing (5th ed.). New York: Harper & Row (1st bridge: Cambridge University Press. ed. 1949). Westmeyer, H. (1975) The diagnostic process as a De Bruyn, E.E.J. (1992). A normative-prescriptive view statistical-causal analysis. Theory and Decision, 6(1), on clinical psychodiagnostic decision making. Eur- 57–86. opean Journal of Psychological Assessment, 8(3), Westmeyer, H., & Hagebo¨ ck, J. (1992). Computer- 163–171. assisted assessment: a normative approach. Eur- De Groot, Adriaan D. (1969). Methodology: Founda- opean Journal of Psychological Assessment, 8(1), tions of Inference and Research in the Behavioural 1–16. Sciences. The Hague: Mouton. Ferna´ndez-Ballesteros, R. (1998). Task force for the development of guidelines for the assessment process Eric E.J. de Bruyn (GAP). Newsletter of the European Association of Psychological Assessment, 1(1), 2–7. Ferna´ndez-Ballesteros, R., De Bruyn, E.E.J., Godoy, A., Related Entries Hornke, L.F., Ter Laak, J., Vizcarro, C., Westhoff, W., Westmeyer, H. & Zaccagnini, J.L. (2001). Assessor’s Bias, Clinical Judgement, Case For- Guidelines for the assessment process (GAP): mulation, Ethics.

[8.8.2002–12:29pm] [1–128] [Page No. 97] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 98 Assessor’s Bias

A ASSESSOR’S BIAS

INTRODUCTION these distortions. Finally, we will take a brief look at approaches for detecting and reducing Psychological assessment is subject to various assessor’s biases. errors of measurement. While some are random, as assumed in classical test theory, others are systematic and lead to consistent distortions of EXAMPLES OF ASSESSOR’S BIASES the true value of a characteristic. These latter errors may be partially due to assessor’s biases. Halo and Logical Error This term does not refer to elementary profes- sional mistakes such as implementing test In psychological assessment, a halo effect refers instructions incorrectly, but to systematic tenden- to an overgeneralization from one prominent cies in case-related information processing that characteristic of a person to other judgements reduce the validity of data. Although these biases on this individual. Most typically, it is an normally impair objectivity and reliability, they overestimation derived from a general impres- remain undetected when they are consistent sion. For example, if a person is judged to be across individuals and time. In addition, a low good in general, he or she will be judged more interrater agreement is not necessarily a sign of positively on any specific dimension. Halo errors assessor’s bias but may be due to valid differences may arise particularly when there is insufficient between settings and informants (Lo¨ sel, 2001). information for a detailed assessment or when Not all types of assessment information are traits are not well defined. In these cases, the equally susceptible to assessor’s biases. Whereas general impression is used to fill information standardized tests or biographical inventories are gaps (Saal et al., 1980). A related bias is the less affected, their impact may be strong in logical error. Here, assessors are likely to give unstructured interviews, behaviour observations, similar ratings to traits that seem logically or trait ratings. For example, some studies on the related in their minds (Guilford, 1954). judgement of job performance have shown that Whereas the halo effect derives from a perceived more than half of the variance is due to coherence of characteristics in an individual, the differences in the assessors (Scullen et al., logical error refers to a more explicit and 2000). In a meta-analysis, approximately 37% abstract coherence of variables or traits. The of the variance in ratings was attributed to them latter is often anchored in the assessor’s (Hyot & Kerns, 1999). subjective personality theory. This article concentrates on biases in assess- Both biases produce the same outcome, ments by other persons (e.g. psychologists, namely spurious and inflated correlations psychiatrists, teachers, or lay informants). (Murphy et al., 1993). The underlying mechan- Although these biases are similar to the isms are also related. Occasionally, a halo effect numerous response sets and distortions in self- can have some advantage because it accentuates reports, some seem less important (e.g. lying, differences between individuals (Murphy et al., simulation, dissimulation, social desirability, or 1993). This is the case when a quick decision positive self-presentation) and others more has to be made and the core determinants of the relevant (e.g. halo, leniency, stringency, or halo effect are empirically valid. Then, one can contrast effects). In the following, we will first simply follow the useful decision rule ‘take the describe several of these errors and afterwards best, ignore the rest’ (Gigerenzer & Selten, address factors that differentiate and moderate 2001).

[8.8.2002–12:29pm] [1–128] [Page No. 98] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Assessor’s Bias 99

Position Effects Contrast and Projection Effects Whereas halo effects result from the psychologi- Biases may also result from comparisons between cal or logical closeness of the rated character- a person’s behaviour and the assessor’s own istics, their sequence or position may have a dispositions. A contrast error is when the assessor similar effect. One such distortion is the attributes characteristics at the opposite pole to proximity error. Judgements that are close to his self-perception; a projection effect when he each other in time or space contain a higher risk evaluates a person as being similar to himself. of mutual influence. A related error is the Both tendencies relate to self-awareness and self- primacy effect in which the first impression of presentation processes in the assessor. For a person overshadows the assessment of their example, persons with behavioural problems further behaviour. Its opposite is the recency may rate others higher on the same dimensions. effect: the last information on a person influences Whereas projection errors can contribute to self- the evaluation of previous data. Early stereotyp- worth by reinforcing social comparability, con- ing (primacy) or easy remembering (recency) are trast effects can serve a similar function by among the mechanisms that underlie these modes protecting the assessor’s individuality. of information processing. Although some experi- ments suggest that recency is more influential than primacy (e.g. Betz et al., 1992), it is Interactional Biases questionable whether such findings can be Assessors’ biases not only influence their own generalized to real-life assessments. information processing but also how persons behave in the assessment situation. Although the Leniency and Stringency assessor’s age, gender, ethnicity, role, status, or institutional affiliation may have such effects These errors refer to the tendency to make (Hagenaars & Heinen, 1982), these should not be relatively positive (leniency) or negative (strin- viewed as biases. Interactional biases refer to gency, severity) assessments. For example, some- influences that derive from the assessor’s informa- body who is rather intelligent would be judged to tion processing. One example is the self-fulfilling be even more intelligent by a lenient rater but less prophecy of positive expectations, although the intelligent by a stringent one. Leniency seems to typical Rosenthal effect has not been replicated be more frequent than stringency (Guilford, sufficiently (Elashoff & Snow, 1972). In the 1954). It may partially reflect tendencies toward practice of psychological assessment, even minor social desirability, harmony, or other disposi- biases can have an effect (e.g. slightly nodding the tions. Assessors who score high in self-monitoring head or providing other nonverbal reinforcements tend to deliver more lenient ratings. Similarly, based on halo or leniency effects). Unstructured leniency correlates negatively with conscientious- interviews are particularly vulnerable to biases ness and positively with agreeableness (Bernardin derived from assessor’s attitudes and expectations. et al., 2000). Nonetheless, other studies question Hyman et al. (1975) distinguishes three forms: (a) the view that leniency is primarily due to Attitude-structure expectations refer to the belief personality dispositions. Situational and rela- that the attitudes of the respondent are unified. tional factors must also be taken into account. They resemble the halo effect and may reinforce uniform reactions. (b) Role expectations relate to Central Tendency the respondent’s membership of a certain group. These stereotypes can result in assessor behaviour Leniency and stringency go along with polariza- that triggers prototypical reactions in the respon- tions between extreme judgements. In contrast, dent. (c) Probability expectations refer to the base other assessors tend to produce scores in the rates of diagnostic characteristics in the respective middle range. Sometimes, this may express a lack population. They can lead to assessor behaviour of differentiated information on a person. In that tries to confirm these specific assumptions. other cases, it involves indifferent perceptions or Other interactive biases contribute to missing data. a general ambivalency or insecurity in the For example, projection or contrast effects may assessor. lead an interviewer to evaluate a question as being

[8.8.2002–12:29pm] [1–128] [Page No. 99] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 100 Assessor’s Bias

extremely difficult or intimate. This can reduce they are well-trained, less than 10% of variance is emphasis and thus lead to more incomplete or attributable to assessor’s biases, but with minimal ‘don’t know’ answers. On the other hand, a very training, these sources may account for over 50% stringent assessor may elicit similar effects through (Hyot & Kerns, 1999). Furthermore, rater agree- behaviour that reduces the respondent’s willingness ment varies according to the observed behaviour to cooperate. samples. If assessors refer to different samples, they Overall, the impact of such interviewer biases will agree less. However, as mentioned before, such seems to be small or not well-investigated interrater differences may indicate true variance (Hagenaars & Heinen, 1982). Probably, the rather than biases (reliability-validity trade-off; more an assessor complies with professional Scullen et al., 2000). For example, employees standards and is not socially involved, the fewer behave differently with their bosses than with their biases will occur (Hyman, 1975). colleagues.

DIFFERENTIAL ISSUES DETECTING AND REDUCING ASSESSOR’S BIASES Rater- versus Dyad-Specific Biases The valid assessment of an assessor’s biases is a Assessor’s biases contain both rater-specific and prerequisite for intervention. Unfortunately, there dyad-specific components (Hyot & Kerns, 1999). is little systematic and practice-oriented research In the former, the error variance is attributable to on this issue. the assessor alone (e.g. a rater who generally tends One strategy is to reconstruct the errors from to leniency when judging coworkers). In the latter, the assessor’s judgements. If he rated specific it is due to a specific relation between the assessor dimensions in various persons and other assessors and the assessee (e.g. a teacher who judges a did the same, inter- and intrarater comparisons difficult student more negatively than he should). are possible. Different frequency distributions, Rater-specific biases are a minor problem when means, variances, and correlations between only one assessor compares individuals on one variables may indicate halo, leniency, extremity, dimension, because the error is the same across all or other errors. However, as mentioned above, judgements. It becomes more problematic when this is only possible when assessors work on the there are several assessors with different biases. The same samples of data. Another strategy is to same holds for complex assessments by a single compare the individual judgements with objective assessor who confounds specific information due to data structures. Brunswik’s lense model can be a halo effect. used to compare regression weights between the Dyad-specific biases seem to be more powerful. respective data and both the assessor’s judgement Because they are less general, they are also more and an objective criterion. For example, a teacher difficult to detect and correct. Neither rater- may place too much weight on verbal intelligence specific nor dyad-specific biases need to be stable. in predicting student achievement. Similarly, They may fluctuate over time and situations configurational analyses can be used to detect according to current influences such as emotional biases in nonlinear data structures. state, task involvement, or organizational factors. Such reconstructions require a great deal of analogue data and judgements. If these are not Moderating Factors available, one can try to assess directly what goes on in the assessor’s mind (e.g. by the method of The magnitude of biases also depends on what thinking aloud or analysing subjective theories by information is gathered. Their impact is relatively using structure-placing or repertory grid techni- small (4% of variance) when ratings are based on ques). However, it is questionable how far these explicit and objective criteria such as behaviour approaches can detect automatized and uncon- frequencies (Hyot & Kerns, 1999). However, it is scious mental processes. Verbal ambiguities and much stronger (47% of variance) when assessors social desirability effects must also be expected. rate global trait characteristics. Training of Assessor’s biases may further be reduced through assessors is another important moderator. When supervision by neutral experts or team feedback

[8.8.2002–12:29pm] [1–128] [Page No. 100] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Attachment 101

sessions. These approaches are most common in Rosenthal–Jacobson Data on Teacher Expectancy. clinical contexts but can also be applied in other Stanford: Stanford University Press. Fernandez-Ballesteros, R., De Bruyn, E.E.J., Godoy, A., fields of psychological assessment. Hornke, L.F., Ter Laak, J., Vizcarro, C., Westhoff, Last, but not least, assessor’s biases can be K., Westmeyer, H. & Zaccagnini, J.L. (2001). reduced by a systematic organization and quality Guidelines for the Assessment Process (GAP): a management of the whole assessment process. proposal for discussion. European Journal of This includes, for example, standardized proce- Psychological Assessment, 17(3), 187–200. Gigerenzer, G. & Selten, R. (Eds.) (2001). Bounded dures, detailed behavioural indicators of cate- Rationality: The Adaptive Toolbox. Cambridge, gories, intensive training of assessors, random- MA: MIT Press. routine check of assessment quality, re-analysable Guilford, J.P. (1954). Psychometric Methods (2nd ed.) data registration (e.g. video recordings), adequate New York: McGraw-Hill. time-spacing of judgements, techniques that Hagenaars, J.A. & Heinen, T.G. (1982). Effects of role- independent interviewer characteristics on responses. enhance systematic comparisons (e.g. in pairs In Dijkstra, W. & van der Zouwen, J. (Eds.), vs. ratings), the clear distinction between data Response Behaviour in the Survey – Interview (pp. description and interpretation, and explicit rules 91–130). London: Academic Press. for data integration. Hyman, H.H., Cobb, W.J., Feldman, J.J., Hart, C.W. & Stember, C.H. (1975). Interviewing in Social Research. Chicago: University of Chicago Press. Hyot, W.T. & Kerns, M.–D. (1999). Magnitude and CONCLUSION AND FUTURE moderators of bias in observer ratings: a meta- PERSPECTIVES analysis. Psychological Methods, 4(4), 403–424. Lo¨ sel, F. (2001) Risk/need assessment and prevention of antisocial development in young people. In Assessor’s biases are important sources of error Corrado, R., Roesch, R., Hart, S.D. & Gierowski, variance. Although these biases cannot be J.K. (Eds.), Multiproblem Violent Youth. Amster- eliminated completely in the human process of dam: IOS Press (in press). assessment, they can be reduced substantially. For Murphy, K.R., Jako, R.A. & Anhalt, R.L. (1993). example, this is possible by following the Nature and consequences of halo error: a critical analysis. Journal of Applied Psychology, 78(2), 218– Guidelines for the Assessment Process recently 225. proposed by a Task Force of the European Saal, F.E., Downey, R.G. & Lahey, M.A. (1980). Association of Psychological Assessment Rating the ratings: assessing the psychometric (Ferna´ndez-Ballesteros et al., 2001). quality of rating data. Psychological Bulletin, 88(2), 413–428. Scullen, S.E., Mount, M.K. & Goff, M. (2000). References Understanding the latent structure of job perfor- mance ratings. Journal of Applied Psychology, 85(6), 956–970. Betz, A.L., Gannon, K.M. & Skowronski, J.J. (1992). The moment of tenure and the moment of truth: when it pays to be aware of recency effects in social Friedrich Lo¨ sel and Martin Schmucker judgements. Social Cognition, 10(4), 397–413. Bernardin, H.J., Cooke, D.K. & Villanova, P. (2000). Conscientiousness and agreeableness as predictors of Related Entries rating leniency. Journal of Applied Psychology, 85(2), 232–236. Item Bias, Clinical Judgement, Process (The Elashoff, J.D. & Snow, E. (1971). A Case Study in Assessment Process) Statistical Inference: Reconsideration of the

A ATTACHMENT

Children are attached, if they tend to seek distress, illness, or tiredness (Bowlby, l984). proximity to and contact with a specific caregiver Attachment is a major developmental milestone in times of stress arising from factors such as in the child’s life, and it remains an important

[8.8.2002–12:29pm] [1–128] [Page No. 101] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 102 Attachment

issue throughout the lifespan. In adulthood, The SSP consists of eight episodes, of which the attachment representations shape the way adults last seven ideally take three minutes. Each feel about the strains and stresses of intimate episode can however be curtailed when the relationships, including parent–child relation- infant starts crying. Episode One begins when ships, and the way in which the self in relation the experimenter leads caregiver and child into an to important others is evaluated. Attachment unfamiliar playroom. Episode Two is spent by theory is a special branch of Darwinian evolution the caregiver together with the child in the theory, and the need to become attached to a playroom. In Episode Three an unfamiliar adult protective conspecific is considered one of the (the ‘stranger’) enters the room and after a while primary needs in the human species. Attachment starts to play with the infant. Episode Four starts theory is built upon the assumption that children when the caregiver departs, and the infant is left come to this world with an inborn inclination to with the stranger. In Episode Five the caregiver show attachment behaviour – and this inclination returns, and the stranger unobtrusively leaves the would have had survival value, or better: would room immediately after reunion. Episode Six increase ‘inclusive fitness’ – in the environment in starts when the caregiver leaves again: the infant which human evolution originally took place. is alone in the room. In Episode Seven the Because of its ethological basis, assessment of stranger returns. In Episode Eight the caregiver attachment implies careful and systematic obser- and the infant are reunited once again, and the vations of verbal and non-verbal behaviour. stranger leaves unobtrusively immediately after reunion. The Strange Situation procedure has been used ASSESSMENT OF ATTACHMENT IN with mothers, fathers, and other caregivers. INFANTS Infants usually are between 12 and 24 months of age. For pre-schoolers, the same SSP is used, Attachment to a protective caregiver helps the but the rating system for classifying the children infant to regulate his or her negative emotions in is different and still is in the process of validation times of stress and distress, and to be able to (Cassidy & Marvin, 1992). On the basis of explore the environment even if it is somewhat infants’ reactions to the reunion with the frightening. The idea that children seek a balance caregiver, three patterns of attachment can be between the need for proximity to an attachment distinguished. Infants who actively seek proxi- figure and the need to explore the wider environ- mity to their caregivers upon reunion, commu- ment is fundamental to the various attachment nicate their feelings of stress and distress openly, measures, such as the Strange Situation procedure and then readily return to exploration are (SSP; Ainsworth et al. 1978) and the Attachment classified as secure (B) in their attachment to Q-Set (AQS; Vaughn & Waters, 1990) (see that caregiver. Infants who seem not distressed, Table 1). Ainsworth and her colleagues observed and ignore or avoid the caregiver following one-year-old infants with their mothers in a reunion are classified as insecure-avoidant (A). standardized stressful separation procedure, and Infants who combine strong proximity seeking used the reactions of the infants to their reunion and contact maintaining with contact resistance, with the caregiver after a brief separation to assess or remain inconsolable, without being able to the amount of trust the children had in the return to play and explore the environment, are accessibility of their attachment figure. classified insecure-ambivalent (C). An overview of all American studies with non- clinical samples (21 samples with a total of 1584 Table 1. Attachment measures infants, studies conducted in the years 1977– 1990) shows that about 67% of the infants are Attachment 12–24 24–48 12 years measure months months and older classified secure, 21% are classified as insecure- avoidant, and 12% are classified insecure- Strange Situation X ambivalent (Van IJzendoorn, Goldberg, Attachment Q Sort X X Kroonenberg, & Frenkel, 1992). The Strange Adult Attachment X Situation classifications have been demonstrated Interview to be valid. For example, secure infants have

[8.8.2002–12:29pm] [1–128] [Page No. 102] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Attachment 103

more sensitive parents than insecure infants dependent on a single procedure for the (in 66 studies with more than 4000 infants, measurement of attachment. Waters and his co- DeWolff & van IJzendoorn, 1997). Furthermore, workers introduced another method for assessing secure infants have more satisfactory peer attachment security in infants and toddlers, i.e. relations, and they develop better language skills the Attachment Q-Sort (AQS). The AQS consists (Cassidy & Shaver, 1999). The SSP also shows of 90 cards. On each card a specific behavioural discriminant validity in comparison with tem- characteristic of children between 12 and 48 perament. One of the most powerful demonstra- months of age is described. The cards can be used tions of the absence of a causal link between as a standard vocabulary to describe the attachment and temperament is the lack of behaviour of a child in the natural home-setting, correspondence between a child’s attachment with special emphasis on secure-base behaviour relationship to his or her mother, and the same (Vaughn & Waters, 1990). After several hours of child’s relationship to his or her father. observation the observer ranks the cards into The concept of ‘disorganized’ attachment nine piles from ‘most descriptive of the subject’ to emerged from the systematic inspection of about ‘least descriptive of the subject’. The number of 200 cases from various samples that were cards that can be put in each pile is fixed, i.e. 10 difficult to classify in one of the three organized cards in each pile. By comparing the resulting Q- attachment categories (Main & Solomon, 1986). sort with the behavioural profile of a ‘proto- In particular in studies on maltreated infants, the typically secure’ child as provided by several limits of the traditional Ainsworth et al. (1978) experts in the field of attachment theory, a score coding system became apparent because many for attachment security can be derived. children with an established background of abuse The AQS has some advantages over the SSP. or neglect nevertheless had to be forced into the First, it can be used for a broader age range secure category. Common denominator of the (12–48 months) than the SSP. Moreover, AQS anomalous cases appeared to be the (sometimes scores for attachment security are based on momentary) absence of an organized strategy to observation of the child’s secure-base behaviour deal with the stress of the SSP. Disorganized in the home and may therefore have higher attachment can be described as the breakdown of ecological validity. Furthermore, because the an otherwise consistent and organized strategy of application of the AQS does not require the emotion regulation. Whether secure or insecure, artificial induction of stress used in the SSP, every child may show disorganization of attach- the method can be applied in cultures and ment depending on the earlier child-rearing populations in which standard application of the experiences. Maltreating parents are supposed SSP has proved to be somewhat complicated. to create disorganized attachment in their infants Because the AQS is less intrusive than the SSP, it because they confront their infants with a may be used more frequently with the same child, pervasive paradox: they are potentially the only for example in repeated measures designs, in source of comfort for their children, whereas at interventions studies, and in studies on children’s the same time they frighten their children through attachment networks. Lastly, the application of their unpredictable abusive behaviour. Disorga- the AQS in divergent cultures or populations may nization of attachment occurs in about 15% of be attuned to the specific prototypical secure-base non-clinical cases, where associations with par- behaviour of the children from those back- ental unresolved loss have been found, and it is grounds. considered a major risk factor in the development When the AQS is sorted by a trained observer of child psychopathology. it shows an impressive predictive validity. In particular, the observer AQS is strongly corre- lated with sensitive responsiveness. At the same ATTACHMENT IN TODDLERS AND time, it should be noted that the association PRESCHOOLERS between observer AQS security and SSP security is rather modest (Van IJzendoorn, Vereijken, & Although the SSP has become remarkably Riksen-Walraven, in press). The AQS and the SSP popular and successful, it has been a drawback may therefore not measure the same construct, or that attachment research was almost exclusively they may be indexing different dimensions of the

[8.8.2002–12:29pm] [1–128] [Page No. 103] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 104 Attachment

same construct. Support for the validity of the their current perspective. For example, subjects AQS as sorted by the mother is less convincing. are asked, which five adjectives describe their The association between the mother AQS and the childhood relationship with each parent, and SSP is disappointingly weak, and the instrument what concrete memories or experiences led them surprisingly shows a stronger association with to choose each adjective. temperament (van IJzendoorn et al., in press). The AAI lasts about an hour and is transcribed Mothers of insecure children may lack the verbatim. Interview transcripts are rated for observational skills that are necessary for an security of attachment as derived from the unbiased registration of secure-base behaviours in subjects’ present discussion of their attachment their children. biographies (Hesse, 1999). The coding of the In this contribution three assessment proce- interviews is not based primarily on reported dures are discussed that play a central role in events in childhood, but rather on the coherency attachment theory and research. The Strange with which the adult is able to describe and Situation Procedure (SSP; Ainsworth et al., 1978) evaluate these childhood experiences and their has been developed to assess attachment security effects. The interview, therefore, does not assess of infants with their parents or other caregivers in the actual quality of childhood attachment a laboratory playroom. The Attachment Q Sort relationships, and a secure representation of (AQS; Vaughn & Waters, 1990) is an instrument attachment is not incompatible with an insecure to observe secure-base behaviour and attachment attachment history throughout childhood. This is security in children from 12–48 months at home. a major difference with questionnaires that ask The Adult Attachment Interview (AAI; Main, for descriptions of the relationship with parents Kaplan, & Cassidy, 1985) is a semi-structured or parent’s parenting, in which descriptions of interview with a coding system (Main & childhood experiences are decisive and taken for Goldwyn, 1994) to assess adolescent and adult granted. Instead, the AAI takes into account that mental representations of attachment. We start retrospection is not necessarily reliable, and that with a brief discussion of the theoretical back- repression and idealization do take place. Hesse ground of these assessment tools. (1999) has suggested that the central task presented to the subject is that of producing and reflecting upon attachment-related memories ASSESSMENT OF ATTACHMENT IN while simultaneously maintaining coherent dis- ADOLESCENCE AND ADULTHOOD course with the interviewer. The coding system of the AAI (Main & Attachment experiences are supposed to become Goldwyn, 1994) includes scales for inferred child- crystallized into an internal working model or hood experiences with parents (e.g. loving, reject- representation of attachment (Bowlby, 1984), ing, role-reversing) and scales for state of mind with which Main, Kaplan, and Cassidy defined as respect to attachment (e.g. anger, idealization, ‘a set of rules for the organisation of information insistence on lack of recall, coherency). The scale relevant to attachment and for obtaining or scores for state of mind are of overriding limiting access to that information’ (1985, pp. importance when it comes to classification of an 66–67). They developed an interview-based interview, in one out of three main categories. method of classifying a parent’s mental represen- Autonomous or secure adults are able to describe tation of attachment, the Adult Attachment their attachment-related experiences coherently, Interview (AAI). The AAI is a semi-structured whether these experiences were negative (e.g. interview that probes alternately for general parental rejection or overinvolvement) or positive. descriptions of attachment relationships, specific They tend to value attachment relationships and to supportive or contradicting memories, and consider them important for their own personality. descriptions of the current relationship with Dismissing adults tend to devalue the importance one’s parents. The interview can be administered of attachment experiences for their own lives or to to parents, professional caregivers, and older idealize their parents without being able to adolescents, and stimulates respondents to both illustrate their positive evaluations with concrete retrieve attachment-related autobiographical events demonstrating secure interaction. They memories and evaluate these memories from often appeal to lack of memory of childhood

[8.8.2002–12:29pm] [1–128] [Page No. 104] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Attachment 105

experiences. Preoccupied adults are still very much 1995). In longitudinal studies covering the first involved and preoccupied with their past attach- 15 to 20 years of life, the infant SSP ment experiences and are therefore not able to classifications have been found to predict the describe them coherently. They may express anger later AAI classifications when major changes in or passivity when discussing current relationships life circumstances were absent (Waters, with their parents. Dismissing and preoccupied Hamilton, & Weinfield, 2000). adults are both considered insecure. Some adults indicate through their incoherent discussion of experiences of trauma (such as maltreatment, or the CONCLUSIONS AND FUTURE loss of an attachment figure) that they have not yet PERSPECTIVES. completed the process of mourning. They receive the additional classification Unresolved, which is We conclude that the Strange Situation superimposed on their main classification. In a Procedure, the Attachment Q Sort, and the meta-analysis on 33 studies, the distribution of Adult Attachment Interview have proven to be non-clinical mothers was as follows: 24% dismiss- invaluable tools for testing empirical hypotheses. ing, 58% autonomous, and 18% preoccupied They have helped to advance attachment theory mothers (Van IJzendoorn & Bakermans- far beyond Bowlby’s first draft some thirty years Kranenburg, 1996). About 19% of the mothers ago. During the past ten years or so, several other were additionally classified as unresolved. Fathers attachment measures have been developed, and adolescents showed about the same distribu- mostly based on the same construction principles tion of AAI classifications. Clinical respondents, that guided the development of the SSP, AQS, however, showed highly deviating distributions, and AAI (Cassidy & Shaver, 1999). Some with a strong overrepresentation of insecure measures mirror the SSP and focus on attachment attachment representations. Systematic relations in pre-schoolers (The Preschool Assessment of between clinical diagnosis and type of insecurity Attachment), others involve projective techniques could not be established. for preschoolers and older children, such as the The test–retest reliability of the AAI has been SAT, drawings or photographs, or doll play. established in several studies, and the same is Other measures are adaptations of the AAI and true of the AAI’s discriminant validity. AAI cover younger (adolescent) age ranges or different classifications turned out to be independent of representational dimensions (working model of respondents’ IQ, social desirability, temperament, the child; working model of caregiving). Self- and general autobiographical memory abilities report paper-and-pencil measures have been (for a review, see Hesse, 1999). The predictive proposed for assessment of attachment in validity of the AAI has been thoroughly tested in adolescence or adulthood, as well as interview a large number of studies in different countries, measures for partner relationships. These alter- and the results can best be described by meta- native attachment measures are still in the process analytic findings. First, the AAI appears to be of validation, and do not yet present the predictive of parent’s sensitive responsiveness. psychometric qualities that SSP, AQS, and AAI Autonomous parents are more responsive to have shown to possess (Cassidy & Shaver, 1999). their child’s attachment signals and needs than In the near future, more data will become insecure parents (Van IJzendoorn, 1995). available on the reliability and validity of these Second, in several (cross-sectional as well as promising measures. They may help to investigate longitudinal) studies parents’ representations of attachment across the lifespan, in various con- attachment were related to the security of the texts, populations, and cultures. parent–child attachment relationship as mea- sured through the Strange Situation procedure. Autonomous parents tended to have secure References children, dismissing parents had insecure-avoi- dant children, preoccupied parents had insecure- Ainsworth, M.D.S., Blehar, M.C., Waters, E. & Wall, S. (1978). Patterns of Attachment. Hillsdale, NJ: ambivalent children, and parents with Lawrence Erlbaum. unresolved loss or other trauma more often Bowlby, J. (1984). Attachment and Loss. Attachment, had disorganized children (Van IJzendoorn, Vol. 1 (2nd ed.). London: Penguin.

[8.8.2002–12:29pm] [1–128] [Page No. 105] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 106 Attention

Cassidy, J., Marvin, R.S. & MacArthur Working validity of the Adult Attachment Interview. Psycho- Group on Attachment. (1992). Attachment organi- logical Bulletin, 117, 387–403. zation in three- and four-year-olds: procedures and Van IJzendoorn, M.H. & Bakermans-Kranenburg, M. coding manual. Unpublished coding manual. Penn- J. (1996). Attachment representations in mothers, sylvania State University. fathers, adolescents, and clinical groups: a meta- Cassidy, J. & Shaver, P.R. (1999). Handbook of analytic search for normative data. Journal of Attachment. Theory, Research, and Clinical Appli- Consulting and Clinical Psychology, 64, 8–21. cations. New York: Guilford. Van IJzendoorn, M.H., Goldberg, S., Kroonenberg, DeWolff, M.S. & Van IJzendoorn, M.H. (1997). P.M. & Frenkel, O.J. (1992). The relative effects of Sensitivity and attachment: a meta-analysis on maternal and child problems on the quality of parental antecedents of infant-attachment. Child attachment: a meta-analysis of attachment in Development 68, 571–591. clinical-samples. Child Development, 63, 840–858. Hesse, E. (1999). The Adult Attachment Interview: Van IJzendoorn, M.H., Vereijken, C.M.J.L. & Riksen- historical and current perspectives. In Cassidy, J. & Walraven, J.M.A. (1996). Is the Attachment Q-Sort Shaver, P.R. (Eds.). Handbook of Attachment. a valid measure of attachment security in young Theory, Research, and Clinical Applications children? In Vaughn, B., Waters, E. & Posada, D. (pp. 395–433). New York: Guilford. (Eds.), Patterns of Secure Base Behaviour: Q-sort Main, M. & Goldwyn, R. (1994). Adult Attachment Perspectives on Attachment and Caregiving in Classification System. Department of Psychology, Infancy and Childhood. Hillsdale, NJ: Erlbaum. University of California at Berkeley. Unpublished Vaughn, B.E. & Waters, E. (1990). Attachment manuscript. behaviour at home and in the laboratory: Q-sort Main, M., Kaplan, N. & Cassidy, J. (1985). Security in observations and Strange Situation classifications of infancy, childhood, and adulthood: a move to the one-year-olds. Child Development, 61, 1965–1973. level of representation. In Bretherton, I. & Waters, Waters, E., Hamilton, C.E. & Weinfield, N.S. (2000). E. (Eds.), Growing Points of Attachment Theory and The stability of attachment security from infancy to Research (pp. 66–104). Society for Research in adolescence and early adulthood: general introduc- Child Development. tion. Child Development, 71, 678–683. Main, M. & Solomon, J. (1986). Discovery of an insecure-disorganized/disoriented attachment pat- Marinus van Ijzendoorn and tern. In Brazelton, T.B. & Yogman, M.W. (Eds.), Marian J. Bakermans-Kranenburg Affective Development in Infancy (pp. 95–124). Norwood, NJ: Ablex. Van IJzendoorn, M.H. (1995). Adult attachment Related Entries representations, parental responsiveness, and infant attachment. A meta-analysis on the predictive Personality (General), Emotions, Motivation

A ATTENTION

INTRODUCTION Some commonly studied processes of attention include selecting, sustaining, and shifting. Attention involves being in a state of alertness, Selection refers to the ability to narrow the field focusing on aspects of the environment that are of stimuli to which one attends for the purpose of deemed important for the task at hand, and enhanced processing. Sustained attention refers to shutting out irrelevant information. As the task the ability to maintain focus and alertness over demands change, attention involves the ability to time. Shifting refers to the ability to change focus flexibly shift focus to another target. Originally, of attention to suit one’s goals and needs. attention was considered a unitary construct but Research has focused on visual or auditory currently it is conceptualized as a complex attention, although environmental stimuli are process involving (a) distributed neural systems, perceived through other modalities as well (i.e. (b) perceptual, emotional, motivational and touch, smell, taste). In addition, research has motor systems, as well as (c) links to multiple focused on attention to the external environment sources of environmental information. rather than to the internal environment (thoughts

[8.8.2002–12:29pm] [1–128] [Page No. 106] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Attention 107

and emotions) since the internal environment is DeBoe’s Visual Search and Attention Test less amenable to objective and reliable methods (VSAT), Miller’s California Computerized of assessment. Assessment Package (CalCAP), Arthur, Barrett and Doverspike’s Auditory Selective Attention Test (ASAT), and The Gordon Diagnostic WHY IS IT IMPORTANT TO ASSESS System. Table 2 lists commonly used scales for ATTENTION? rating attention.

Attention is central to the ability to function perceptually, cognitively and socially. For that FUTURE DIRECTIONS reason it is important to have basic scientific understanding of attention processes and the Deal with Issues Pertaining to psychological and environmental conditions that Assessment for the Purpose of govern the development of attention and its Increasing Knowledge about deployment under specific circumstances. With Specific Processes such knowledge in hand, one can design environments that promote optimal attention to There is a need to understand to what extent the important characteristics in those settings. processes outlined above are really independent In addition, it is important to assess attention rather than different manifestations of the same so as to map out individual differences in the core. This calls for a more integrated under- development and use of attention. These differ- standing of attention and for the development of ences are mostly in the normal range but may a basic assessment battery that could be used also include deficits that are quite marked as seen when people are referred with problems in in children diagnosed with Attention Deficit attention. Disorder or in adults diagnosed with schizo- phrenia, depression or substance abuse problems. Checking Ecological Validity The assessment of attention is important for parents and teachers who detect difficulties in a To what extent are the assessments telling child’s ability to focus attention and wish to have something about functioning under some specific the child evaluated. Similarly, attention problems environmental conditions but don’t generalize to may be presented in adults who have suffered these processes as they operate in every-day, out head injuries or stroke, and who would need to of the lab environments? Questions remain about be evaluated to determine the seriousness of the the extent to which it is possible to do well on all deficits involved. Diagnosing such deficits is laboratory assessments but have problems in the dependent on information about individual every day context. Similarly, is it possible to differences in attention and on the availability function well in the everyday environment and of appropriate assessment tools. yet have problems on laboratory assessments.

Developing an Attention Battery ASSESSMENT METHODS The battery would need to be based on normative Methods have been developed for the assessment data and would need to have specified cut off of specific aspects of attention, including selective lines between the normal range and problem attention, sustained attention, and shifting atten- range. Children would benefit from a routine tion. These methods include performance tests, assessment using such a battery in the same way mapping brain activity during performance of that they benefit from routine examination of tasks and finally, rating scales. Table 1, lists their hearing and vision. Systematically evaluat- commonly used performance tasks, the aspects of ing how children perform in terms of their attention they assess and the contexts in which attention is important since children may have they are used (clinical or research). Additional deficits that they mask through idiosyncratic information can be found in Barkley (1994). cognitive strategies or by working harder than Other tests include Trenerry, Crosson and what would normally be required.

[8.8.2002–12:29pm] [1–128] [Page No. 107] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 8820–22p][–2][aeN.18 IS ROS{ok}alseo/alseo-.dPpr alseo- Keyword Ballesteros-A Paper: {Books}Ballesteros/Ballesteros-A.3d PROOFS FIRST 108] No. [Page [1–128] [8.8.2002–12:29pm] 108 Attention

Table 1. Commonly used performance tastes Process Assessment Name Short Description Assessed Behaviour Contexts of Use I. Selective Children’s Symbol cancellation Number targets identified; Research Attention checking Task Number targets missed; Incorrect identifications Digit Symbol/Coding Wechsler scales subtest Timed task of correctly Clinical indicating which symbol Research corresponds to a number Stroop Colour-World Naming the ink colour of words Time to complete each portion; Research Interference Test that spell a colour different Number of correct responses Clinical from the ink colour The Trail Making Test Connecting letters and numbers Time to complete each part; Research placed randomly on a page Number of errors Clinical Children’s Embedded Identifying a target figure Mean time to respond; Clinical Figures Test embedded among non-targets Number of correct responses Research Posner’s Visual-Spatial Responding to targets presented Difference in reaction time in the Research Selective Attention Test to the left/right visual fields presence of valid and invalid cues II. Sustained Reaction Time Task(s) Responding to simple target visual stimuli Mean reaction time; Research Attention Variability of response time Continuous Performance Responding to target stimuli and Response time; Number Research Test (CPT) inhibiting response to on-target stimuli correct responses; Clinical Errors of omission; Errors of commission KABC Hand Movements Imitating progressively longer Standard score of successful Research sequences of skilled hand movements number of sequences Clinical 8820–22p][–2][aeN.19 IS ROS{ok}alseo/alseo-.dPpr alseo- Keyword Ballesteros-A Paper: {Books}Ballesteros/Ballesteros-A.3d PROOFS FIRST 109] No. [Page [1–128] [8.8.2002–12:29pm]

III. Shifting Wisconsin Card Sorting 128 cards containing sets of % of correct; Research Attention Sorting Task geometric designs – varying colour, Number of categories achieved; Clinical form, number Perseverative errors; Perseverative responses; Non-perseverative responses Halstead-Reitan Choosing from 1 of 4 choices from Number of correct responses; Clinical Neuropsych. Test a projected stimuli based on a principle Same behaviours as above Battery – Categories Test IV. Numerical Digit Span Wechsler scales subtest Accurate memory for a specific Research Mnemonic string of numerical stimuli Clinical Attention (forward & backward) Arithmetic Wechsler scales subtest Correct solutions provided verbally Research Clinical V. Physiological Electrodes placed on chest record Decrements in heart rate Research Heart Rate the electrocardiogram (EKG) reflect attention processes Cortical Electrodes placed on scalp record Large, slow waves indicate lapses in Research Electrophysiology electroencephalograph (EEG) attention during sustained attention task Cerebral blood flow Blood flow to brain regions is mapped Denser distribution indicates Research by positron emission tomography (PET) more active metablolism Attention 109 110 Attitudes

Table 2. Commonly used scales for rating attention understanding of attention processes and on Rating scales developing new assessment tools. Title ADD-H Comprehensive Teacher Rating Scale Further Readings ADHD Rating Scale Attention Deficit Disorders Evaluation Scale Barkley, R.A. (1994). The assessment of attention in Behaviour Assessment System for Children children. In Lyon, G.R. (Ed.), Frames of Reference Child Attention Profile by Edelbrock for the Assessment of Learning Disabilities. Balti- Child Behaviour Checklist more: Paul H. Brookes Publishing. Conners’ Parent and Teacher Rating Scale – Revised Ruff, H.A. & Rothbart, M.K. (1996). Attention in Hyperactive Behaviour Code Early Development: Themes and Variations. New York: Oxford University Press. Underwood, G. (Ed.), (1993). The Psychology of CONCLUSIONS Attention, Vols. I and II. New York: New York University Press.

Attention is central to cognitive and social Sarah Friedman and Anita Konachoff functioning and has been the subject of scientific research for decades. It is regulated by neural, perceptual, emotional, motivational and motor Related Entries systems and influenced by both internal and external stimuli. Because of its central and Theoretical Perspectives: Cognitive, Intelli- gence (General), Ambulatory Assessment, Brain complex role in behaviour, there are many Activity Measurement, Equipment for Assessing methods for assessing its various aspects. Basic Processes Despite the long history of interest in the topic, scientists are still working to achieve greater

A ATTITUDES

INTRODUCTION reliably associated with the respondent’s tendency to evaluate the object in question. In Evaluation is a fundamental reaction to any contrast to implicit responses, which cannot be object of psychological significance (Jarvis & easily controlled, explicit evaluative responses are Petty, 1996; Osgood, Suci, & Tannenbaum, under the conscious control of the respondent. 1957). The present entry reviews some of the Most explicit attitude measures either rely on major techniques that have been developed to direct attitudinal inquiries or infer the respon- assess these evaluative reactions, or attitudes. A dents’ evaluations from their expressions of discussion of methods based on explicit evalua- beliefs about the attitude object. tive responses – direct and inferred – is followed by a consideration of disguised and implicit Direct Evaluations assessment techniques. Emphasis is placed on questions of reliability, validity, and practicality. Single-item direct measures. Laboratory experi- ments and attitude surveys frequently use single items to obtain direct evaluations of the attitude EXPLICIT MEASURES OF ATTITUDE object. Confronted with the item, ‘Do you approve of the way the President is doing his Virtually any response can serve as an indicator job?’ respondents may be asked to express their of attitude toward an object so long as it is degree of approval on a five-point scale that

[8.8.2002–12:29pm] [1–128] [Page No. 110] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Attitudes 111

ranges from ‘approve very much’ to ‘disapprove others’; ‘All in all, I am inclined to feel that I am very much’. Such single items can be remarkably a failure’). Coefficients of internal consistency good indicators, especially for well-formed and test–retest reliability for this measure are attitudes toward familiar objects. They are typically quite high (see Robinson, Shaver, & sometimes found to have quite high levels of Wrightsman, 1991). reliability and to correlate well with external The most frequently employed multi-item criteria. For example, the single item, ‘I have direct measure of attitude, however, is the high self-esteem’ (attitude toward the self), evaluative semantic differential (Osgood et al., assessed on a five-point scale ranging from ‘not 1957). Using large sets of seven-point bipolar very true of me’ to ‘very true of me’, was found adjective scales, Osgood and his associates to have a test–retest reliability of 0.75 over a discovered that evaluative reactions (i.e. attitudes) four-year period, compared to a reliability of capture the most important dimension of any 0.88 for the multi-item Rosenberg Self-Esteem object’s connotative meaning. Consequently, it is Scale (Robins, Hendin, & Trzesniewski, 2001, possible to obtain a measure of attitude by asking Study 1). Moreover, the single- and multi-item respondents to rate any construct on a set of measures correlated highly with each other, and bipolar evaluative adjective scales, such as good– they had comparable correlations with various bad, harmful–beneficial, desirable–undesirable, external criteria (e.g. self-evaluation of physical pleasant–unpleasant, and useful–useless. When a attractiveness, extraversion, optimism, life satis- sufficient number of such scales is used, the faction). evaluative semantic differential is found to have However, single items do not always exhibit very high internal consistency and temporal such favourable psychometric properties. They stability. One caveat with respect to the semantic often have low reliabilities and can suffer from differential has to do with possible ‘construct- limited construct validity. Many attitude objects scale interactions’. Although certain adjective are multidimensional and a single item can be pairs generally indicate evaluation, these adjec- ambiguous with respect to the intended dimen- tives can take on more specific denotative sion (e.g. ‘religion as an institution’ vs. ‘religious meaning in relation to particular attitude objects. faith’). Furthermore, single items contain nuan- Thus, the adjective pair sick–healthy usually ces of meaning that may inadvertently affect reflects evaluation when rating people, but it responses to attitudinal inquiries. An item may be a poor measure of evaluation when inquiring whether the United States should respondents are asked to judge the construct allow public speech against democracy leads to ‘mental patients’. different conclusions than one asking whether the United States should forbid such speech (see Inferred Evaluations Schuman & Presser, 1981). In addition to such framing effects, research has revealed strong Although multi-item direct attitude measures context effects in attitudinal surveys. exhibit high degrees of reliability, they do not Respondents tend to interpret a given item in address the problems raised by the multi- light of the context created by previous dimensionality of attitude objects, or by framing questions. Thus, responses to questions about and context effects, problems that jeopardize the satisfaction with life in general and satisfaction validity of direct evaluations. Several standard with specific aspects of one’s life, such as one’s attitude-scaling methods, such as Thurstone and work or romantic relationship, are found to be Likert scaling, avoid these difficulties by sampling influenced by the order in which these questions a broad range of responses relevant to the are asked (Schwarz, Strack, & Mai, 1991). attitude object and then inferring the common Multi-item direct measures. It is possible to underlying evaluation. Whereas responses to raise the reliability of a direct attitude measure by items on a Thurstone scale are required to have increasing the number of questions asked. The a curvilinear relation to the overall attitude, the Rosenberg Self-Esteem Scale (Rosenberg, 1965), more common Likert method requires that item for example, contains 10 items, each a direct operation characteristics have a linear or at least inquiry into self-esteem (e.g. ‘I feel that I am a monotonic shape (Green, 1954). In practice, an person of worth, at least on an equal basis with investigator using Likert’s method of summated

[8.8.2002–12:29pm] [1–128] [Page No. 111] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 112 Attitudes

ratings (Likert, 1932) begins by constructing DISGUISED ATTITUDE MEASURES a large set of items, usually statements of belief, that are intuitively relevant for the attitude Notwithstanding the psychometric advantages of object. To illustrate, the following items are inferred attitude measures over direct assessment part of a Likert scale that was designed to assess techniques, all explicit measures – direct and attitudes toward illegal immigrants (Ommundsen inferred – are subject to response biases that may & Larsen, 1997). jeopardize their validity. The most serious of . Illegal aliens should not benefit from my tax these biases is the tendency to respond to dollars. attitudinal inquiries in a socially desirable . There is enough room in this country for manner (Paulhus, 1991). This tendency is a everyone. particularly severe threat to validity when dealing . Illegal aliens are a nuisance to society. with such socially sensitive issues as racism and . Illegal aliens should be eligible for welfare. sexism, or with potentially embarrassing topics, . Illegal aliens provide the United States with such as sexual behaviour or tax evasion. Various a valuable human resource. methods have been developed in attempts to . We should protect our country from illegal overcome or at least alleviate social desirability aliens as we would our own homes. responding. One approach assumes that individuals differ The investigators initially constructed 80 items in their tendency to provide socially desirable of this kind. Selection of items that had high responses. Scales are available to assess a person’s correlations with the total score yielded a final general tendency to respond in a socially 30-item scale. Most Likert scales ask respon- desirable manner (see Paulhus, 1991), and these dents to indicate their degree of agreement with scales can be used to select attitude items that are each statement on a five-point scale (strongly relatively free of general social desirability agree, agree, undecided, disagree, strongly influences or to statistically remove variance due disagree). Responses to negative items are to individual differences in social desirability reverse scored and the sum across all items responding. Unfortunately, this approach fails to constitutes the measure of attitude. The respon- identify socially desirable responses that are not dents’ attitudes are thus inferred from their part of a general tendency but rather are unique beliefs about the attitude object (see Fishbein & to a given topic or assessment context. Ajzen, 1975). The problem of social desirability responding By covering a broad range of issues relevant to arises because the purpose of explicit attitude the attitude object, multi-item belief-based scales measures is readily apparent. Other approaches can do justice to the multidimensional nature of the to this problem therefore attempt to reduce the issue under consideration, avoiding the potential measure’s transparency or completely disguise its ambiguity of direct measures. Furthermore, by purpose. In measures of whites’ attitudes toward including many differently worded questions that African Americans, for example, item wording appear in unsystematic order, they also avoid has changed over the years to accommodate the idiosyncratic framing and context effects. As a changing social climate. The ethnocentrism scale result, standard multi-item attitude scales tend to (Adorno, Frenkel-Brunskwik, Levinson, & have high reliability and, in many applications, Snaford, 1950), used in the 1950s, contained exhibit high degrees of predictive and construct such blatantly racist statements as, ‘Manual labor validity (Ajzen, 1982). Collections of scales and unskilled jobs seem to fit the Negro designed to assess social and political attitudes mentality and ability better than more skilled or can be found in Robinson, Shaver, and responsible work’. About 15 years later, the Wrightsman (1991, 1999). The obvious disadvan- Multifactor Racial Attitude Inventory tage in comparison to direct attitude assessment lies (Woodmansee & Cook, 1967) employed more in the increased time and effort required to develop mildly worded items, such as, ‘I would not take a multi-item inferred attitude scales and in the fact Negro to eat with me in a restaurant where I was that such scales may not be suitable for large-scale well known’. The most popular explicit attitude telephone surveys. scale used today, the Modern Racism Scale

[8.8.2002–12:29pm] [1–128] [Page No. 112] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Attitudes 113

(McConahay, Hardee, & Batts, 1981), is an reliability and to be of questionable validity as attempt at a relatively nonreactive measure that measures of attitude. The most promising bodily captures the ambivalence many people experience response measure to date is the facial electro- with respect to African Americans: negative myogram (EMG), an electrical potential accom- feelings that contrast with a desire to live up to panying the contraction of muscle fibers. Subtle ideals of equality and fairness. Among the items contractions of facial muscles during exposure to on this scale are, ‘It is easy to understand the attitude-relevant stimuli appear to reveal under- anger of black people in America’ and ‘Blacks are lying positive or negative affective states (Petty & getting too demanding in their push for equal Cacioppo, 1983). Relatively few studies have rights’. been conducted to test the validity of this Although less blatant than earlier measures, method, but even if its validity is confirmed, the the Modern Racism Scale is still quite transpar- facial EMG requires extensive training and ent in its attempt to assess attitudes toward complex technology. It is thus not a very African Americans and is thus potentially subject practical method for conducting large-scale to social desirability responding. The error- attitude surveys, although it may be quite useful choice method (Hammond, 1948) was an early in a laboratory context. attempt to avoid social desirability responding In a related method, electrodes are attached to by disguising the purpose of the measurement various sites and an attempt is made to persuade and exploiting the tendency of attitudes to bias respondents that physiological responses are responses without a person’s awareness. being measured and that these responses provide Respondents are asked to choose which of two a reliable indication of their true attitudes. Even apparently factual items, equidistant from the though no physiological measures are actually known state of affairs, is true (e.g. ‘25% of taken, respondents believing that their true African Americans attend college’ versus ‘55% attitudes are being read by the machine are of African Americans attend college’). Choice expected to provide truthful answers to attitu- of the low estimate may indicate a more dinal inquiries (Jones & Sigall, 1971). Empirical negative attitude, but because the survey is evidence suggests that the ‘bogus pipeline’ presented as a fact quiz, participants will usually method can indeed help to reduce response not be aware that their attitudes toward African biases due to social desirability concerns Americans are being assessed and their responses (Quigley-Fernandez & Tedeschi, 1978). This may thus be uninfluenced by social desirability method, however, again requires a fairly complex concerns. laboratory setup.

Response Latency IMPLICIT MEASURES OF ATTITUDE Somewhat more practical are methods that rely Perhaps the most effective way to avoid response on response latencies to assess implicit attitudes biases associated with explicit attitude measures because the time it takes to respond to an is not to obscure the test’s purpose but to observe attitudinal inquiry can be assessed with relative evaluative responses over which respondents have ease. The most popular response-latency method little or no control. is the Implicit Association Test (IAT; Greenwald, McGhee, & Schwartz, 1998) which is based on Bodily Responses the assumption that evaluative responses or judgements can be activated automatically, out- A variety of physiological and other bodily side the respondent’s conscious awareness. responses have been considered as possible Participants are asked to respond as quickly as indicators of evaluation, including facial expres- possible to words that signify the attitude object sions, head movements, palmar sweat, heart rate, and words with positive or negative valence. electrical skin conductance (GSR), and constric- When measuring implicit attitudes toward tion and expansion of the pupil (see Petty & African Americans, for example, the attitude Cacioppo, 1983). By and large, measures of this object may be represented by first names kind have been found to have relatively low recognized as belonging to white or black

[8.8.2002–12:29pm] [1–128] [Page No. 113] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 114 Attitudes

Americans (e.g. ‘Josh’ vs. ‘Jamel’) and the predictive validity of implicit attitude measures. valenced words by common positive or negative It has been suggested that low response latencies concepts (e.g. ‘health’ vs. ‘grief’). Instructions reflect commonly shared and automatically that require highly associated categories to share activated stereotypes, but that privately held, a response key tend to produce faster reactions explicit beliefs in conflict with the implicit than instructions that require less associated stereotype can override the automatic response categories to share a response key. Prejudiced in determining actual behaviour (Devine, 1989). individuals would therefore be expected to respond more quickly to combinations of black names with negative words than to combinations FUTURE PERSPECTIVES AND of black names with positive words, and they CONCLUSIONS should show the reverse pattern for white names. The discrepancy between the response The great effort that has been invested over the latencies for the two situations is taken as a years in the development of attitude measurement measure of implicit acceptance of the association procedures attests to the centrality of the attitude between an attitude object and valenced attri- construct in the social and behavioural sciences. butes, thus providing an implicit measure of Table 1 summarizes the different types of attitude. measures commonly employed in attitude An alternative procedure relies on sequential research. Single items are often used with evaluative priming (Fazio, Jackson, Dunton, & considerable success to assess evaluative reactions Williams, 1995). Applied to the measurement of to attitude objects, but multi-item instruments racial attitudes, photos of black and white faces that infer attitudes from a broad range of may be presented as primes, followed by positive responses to the attitude object tend to yield or negative target words. The participant is asked measures of greater reliability and validity. to judge the valence of each target word as Implicit attitude measure hold out promise for quickly as possible. As in the IAT, a low response overcoming people’s tendencies to respond in latency is taken as an indication of a strong socially desirable ways to explicit attitudinal association between the valenced word and the inquiries, especially when dealing with sensitive category (‘black’ or ‘white’) represented by the issues or with domains in which attitudes are prime. Thus, if words with negative valence are conflicted or ambivalent. However, more work is judged more quickly when they follow a ‘black’ needed to establish the conditions under which prime as compared to a ‘white’ prime, and when implicit attitude measures are better indicators of the opposite is true for positive words, it is taken response dispositions than are explicit measures. as evidence for a negative attitude toward African It appears that implicit attitudes may be Americans. predictive of actual behaviour in ambiguous Response-latency measures have been used contexts where the relevance of an explicit mainly in attempts to assess implicit racial and sexual stereotypes and prejudice. Test–retest reliabilities of implicit measures have been Table 1. Common Attitude Assessment Techniques found to be of moderate magnitude (0.50 to Response type Representative technique 0.60) over a time span of one hour to three weeks (Kawakami & Dovidio, 2001); they tend Explicit – direct to be virtually uncorrelated with corresponding Single-item Self-rating scale explicit measures (Fazio et al., 1995; Greenwald Multi-item Semantic differential et al., 1998; Kawakami & Dovidio, 2001), Explicit – infrared Thurstone scaling, indicating that they indeed tap a different type Likert scaling of attitude; and they tend to reveal prejudice Disguised Error-choice method where explicit measures reveal little or none (e.g. Implicit Greenwald et al., 1998), suggesting that implicit Bodily responses GSR, heart rate, papillary measures may be subject to less social desir- response, EMG Response latency Implicit association test, ability bias than explicit measures. However, evaluative priming questions have been raised with respect to the

[8.8.2002–12:29pm] [1–128] [Page No. 114] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Attitudes 115

attitude is unrecognized or can be denied, but McConahay, J.B., Hardee, B.B. & Batts, V. (1981). explicit attitudes may override implicit response Has racism declined in America: it depends on who is asking and what is asked. Journal of Conflict tendencies when the relevance of the explicit Resolution, 25, 563–579. attitude is readily apparent (see Fiske, 1998 for Ommundsen, R. & Larsen, K.S. (1997). Attitudes a discussion of these issues). toward illegal aliens: the reliability and validity of a Likert-type scale. The Journal of Social Psychol- ogy, 135, 665–667. Osgood, C.E., Suci, G.J. & Tannenbaum, P.H. (1957). References The Measurement of Meaning. Urbana, IL: Uni- versity of Illinois Press. Adorno, T.W., Frenkel-Brunskwik, E., Levinson, D.L. Paulhus, D.L. (1991). Measurement and control of & Snaford, R.N. (1950). The Authoritarian Person- response bias. In Robinson,. J.P., Shaver, P.R. & ality. New York: Harper. Wrightsman, L.S. (Eds.), Measures of Personality Ajzen, I. (1982). On behaving in accordance with one’s and Social Psychological Attitudes (pp. 17–59). San attitudes. In Zanna, M.P., Higgins, E.T. & Herman, Diego, CA: Academic Press. C.P. (Eds.), Attitude Structure and Function. The Petty, R.E. & Cacioppo, J.T. (1983). The role of bodily Third Ohio State University Volume on Attitudes responses in attitude measurement and change. In and Persuasion, Vol. 2, (pp. 3–15). Hillsdale, NJ: Cacioppo, J.T. & Petty, R.E. (Eds.), Social Erlbaum. Psychophysiology: A Sourcebook (pp. 51–101). Devine, P.G. (1989). Stereotypes and predudice: their New York: Guilford Press. automatic and controlled components. Journal of Quigley-Fernandez, B. & Tedeschi, J.T. (1978). The Personality & Social Psychology, 56, 5–18. bogus pipeline as lie detector: two validity studies. Fazio, R.H., Jackson, J.R., Dunton, B.C., & Williams, Journal of Personality and Social Psychology, 36, C.J. (1995). Variability in automatic activation as an 247–256. unobstrusive measure of racial attitudes: a bona fide Robins, R.W., Hendin, H.M. & Trzesniewski, K.H. pipeline? Journal of Personality and Social Psychol- (2001). Measuring global self-esttem: construct ogy, 69, 1013–1027. validation of a single-item measure and the Fishbein, M., & Ajzen, I. (1975). Belief, Attitude, Rosenberg Self-Esteem Scale. Personality and Social Intention, and Behaviour: An Introduction to Psychology Bulletin, 27, 151–161. Theory and Research. Reading, MA: Addison- Robinson, J.P., Shaver, P.R. & Wrightsman, L.S. (Eds.) Wesley. (1991). Measures of Personality and Social Psycho- Fiske, S.T. (1998). Stereotyping, prejudice, and logical Attitudes. San Diego, CA: Academic Press. discrimination. In Gilbert, D.T., Fiske, S.T. & Robinson, J.P., Shaver, P.R. & Wrightsman, L.S. (Eds.) Gardner, L. (Eds.), The Handbook of Social (1999). Measures of Political Attitudes. Measures of Psychology, Vol. 2 (4th ed., pp. 357–411). Boston, Social Psychological Attitudes, Vol. 2. San Diego, MA: McGraw-Hill. CA: Academic Press. Green, B.F. (1954). Attitude measurement. In Lindzey, Rosenberg, M. (1965). Society and the Adolescent Self- G. (Ed.), Handbook of Social Psychology, Vol. 1 Image. Princeton, NJ: Princeton University Press. (pp. 335–369). Reading, MA: Addison-Wesley. Schuman, H. & Presser, S. (1981). Questions and Greenwald, A.G., McGhee, D.E. & Schwartz, Answers in Attitude Surveys: Experiments on J.L.K. (1998). Measuring individual differences Question Form, Wording, and Context. San Diego, in implicit cognition: The implicit association CA: Academic Press. test. Journal of Personality and Social Psychol- Schwarz, N., Strack, F. & Mai, H.-P. (1991). ogy, 74, 1464–1480. Assimilation and contrast effects in part-whole Hammond, K.R. (1948). Measuring attitudes by error question sequences: a conversational logic analysis. choice: An indirect method. Journal of Abnormal Public Opinion Quarterly, 55, 3–23. and Social Psychology, 43, 38–48. Woodmansee, J. & Cook, S. (1967). Dimensions of Jarvis, W.B.G. & Petty, R.E. (1996). The need to racial attitudes: their identification and measure- evaluate. Journal of Personality and Social Psychol- ment. Journal of Personality and Social Psychology, ogy, 70, 172–194. 7, 240–250. Jones, E.E. & Sigall, H. (1971). The bogus pipeline: a new paradigm for measuring affect and attitude. Icek Ajzen Psychological Bulletin, 76, 349–364. Kawakami, K. & Dovidio, J.F. (2001). The reliability of implicit stereotyping. Personality and Social Related Entries Psychology Bulletin, 27, 212–225. Likert, R. (1932). A technique for the measurement of Personality (General), Interests, Emotions, En- attitudes. Archives of Psychology, 140, 5–53. vironmental Attitudes and Values.

[8.8.2002–12:29pm] [1–128] [Page No. 115] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 116 Attribution Styles

A ATTRIBUTION STYLES

INTRODUCTION several attributional dimensions. The Attributional Style Questionnaire (ASQ; Shortly after research on attribution theory Peterson, Semmel, Von Baeyer, Abramson, blossomed, measures were developed to assess Metalsky, & Seligman, 1982) is the most attributional style – the presence of cross- widely known. It contains 12 hypothetical situational consistency in the types of attributions events, half describing positive events (‘you meet people make. Two approaches to measuring a friend who compliments you on your appear- attributional style are reviewed here. The first ance’) and half describing negative events (‘you involves global measures that assume attribu- go out on a date and it goes badly’). Events are tional style broadly applies across a variety of further divided into an equal number of situations (see Table 1 for a list of the most interpersonal and achievement contexts. The widely used measures of attributional style). perceived cause of each event is rated along the These measures were developed to test predic- dimensions of locus (due to the person or the tions from the reformulated theory of learned situation), stability (likely or unlikely to occur helplessness depression (Abramson, Seligman, & again), and globality (limited in its influence or Teasdale, 1978). The second approach involves widespread) using seven-point scales. Scores can more specific measures of attributional style. This be computed for each dimension within positive approach emerged, in part, from critiques of the and negative events. Factor analyses of the ASQ cross-situational consistency of the global mea- have supported the presence of distinct attribu- sures. These measures assess attributional style in tional styles for negative and positive events more limited contexts such as work, school, and (Xenikou, Furnham, & McCarrey, 1997) relationships. although results presented by Cutrona, Russell, & Jones, 1985, indicate that each event on the ASQ represents its own factor. However, findings GLOBAL MEASURES OF suggest that attributions for negative events are ATTRIBUTIONAL STYLE most strongly related to depression (Sweeney, Anderson, & Bailey, 1986). Scores can be further Dimensional Measures analysed within interpersonal and achievement contexts, a distinction that appears to be more Dimensional measures of attributional style relevant to positive than negative events. require respondents to generate causes for The ASQ has proven to be a valid predictor of hypothetical events and then to rate them along depression. People who make internal, stable, and

Table 1. Widely used measures of attributional style Global measures Attributional Style Questionnaire (ASQ; Peterson et al., 1982) Attributional Style Assessment Test (ASAT; Anderson & Riger, 1991). Children’s Attributional Style Questionnaire (CASQ; Seligman et al., 1984) Content Analysis of Verbatim Explanations (CAVE; Peterson, 1992) Intermediate measures Academic Attributional Style Questionnaire (AASQ; Peterson & Barett, 1987) Organizational Attributional Style Questionnaire (OASQ; Kent & Martinko, 1995) Relationship Attribution Measure (RAM; Bradbury & Fincham, 1990)

[8.8.2002–12:29pm] [1–128] [Page No. 116] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Attribution Styles 117

global attributions for negative events tend to be depression, loneliness, and shyness as well as more depressed. However, there are at least four depressive-like motivational deficits in laboratory problems with the ASQ. First, internal consis- settings. Furthermore, this body of work has tency for the ASQ ranges from adequate to low, demonstrated the importance of assessing attribu- especially for the locus dimension. A frequent tional styles separately for interpersonal and solution is to combine the three dimensions into noninterpersonal situations. Finally, this work a single index to increase reliability, as the has shown substantial correlations between dimensions tend to correlate highly with one attributional styles for successful events and another. However, this creates a second problem depression (and loneliness and shyness). one of interpretation. There are unique predic- Several other dimensional measures of attribu- tions for each attributional style dimension; using tional style use the same basic approach as the a composite score prevents valid tests of the ASQ and ASAT. The Balanced Attributional model (Carver, 1989). Reivich (1995) advises Style Questionnaire (BASQ; Feather & researchers to analyse ASQ data in terms of both Tiggemann, 1984) uses a format similar to the individual dimensions and composite scores. The ASQ but, like the ASAT, the positive and third problem is also related; the ASQ does not negative items mirror one another. The scales assess the key attributional dimension of con- have moderate reliabilities and correlate with trollability. The few studies that included depression, self-esteem, and protestant work controllability consistently find that it is the ethic. The Real Events Attributional Style most important attributional style dimension, Questionnaire (REASQ; Norman & Antaki, whereas globality is the least important (e.g. 1988) requires that respondents generate the Deuser & Anderson, 1995). The fourth problem positive and negative events for which they then concerns the affiliation versus achievement make attributions. This may yield a better distinction; several of the ‘achievement’ items prediction of depression, but the loss of item involve affiliative contexts. The Expanded standardization creates other problems. Attributional Style Questionnaire (EASQ; Peterson & Villanova, 1988) uses an identical format to the ASQ and addresses the problem of Forced-Choice Measures low reliability by increasing the number of situations included in the measure. However, Forced-choice measures have respondents select a reliabilities remain modest and the other pro- cause from a list of potential explanations. One blems remain unresolved. benefit is that this method may more accurately The third and fourth versions of the mirror how people typically select a cause (i.e. Attributional Style Assessment Test (ASAT-III without thinking about dimensions). Also, the and ASAT-IV) provide another dimensional types of causes in the list can be restricted to only assessment of attributional style (Anderson & those attributions of theoretical interest. Forced- Riger, 1991). These measures use a format choice measures also require less time to similar to the ASQ but they incorporate a complete. larger number of items (20 for the ASAT-III The ASAT-I and ASAT-II use this forced and 36 for the ASAT-IV), include the controll- choice format. Respondents are provided with a ability dimension, and use success and failure number of hypothetical situations (20 for the items that mirror each other (e.g. ‘succeeded’ vs. ASAT-I and 36 for the ASAT-II). On the ASAT- ‘failed’ at coordinating an outing for a group of I, the listed types of causes are strategy, ability, people...). The interpersonal versus noninterper- effort, personality traits, mood, and circum- sonal subsets of items are more clearly differ- stances. ASAT-II includes only strategy, effort, entiated than the affiliation versus achievement and ability causes. The number of times a items of the ASQ. Internal reliabilities at the particular cause is selected is summed to create a subscale level tend to be weak to modest, in measure of attributional style for that dimension. the 0.5–0.6 range; collapsing across situation Kuder–Richardson (K-R 20) reliabilities for types (e.g. ignoring the interpersonal vs. non- the subscales tend to be in the low to moderate interpersonal distinction) yields somewhat larger range. Correlations with loneliness and de- alphas. These scales have successfully predicted pression have established the validity of these

[8.8.2002–12:29pm] [1–128] [Page No. 117] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 118 Attribution Styles

scales in both U.S. and Mainland China college types of attributions people make. However, student populations (Anderson, 1999).1 several studies have questioned this assumption. Cutrona et al. (1985) found that the ASQ was a Measures For Children poor predictor of attributions for actual events, suggesting that situational factors may play a The Children’s Attributional Style Questionnaire more important role in predicting attributions. (CASQ; Seligman et al., 1984) was developed to Factor analyses by Cutrona et al. (1985) suggest allow researchers to study attributional style in that there is little cross-situational consistency in children ages 8–13. The CASQ includes 48 items global measures of attributional style. divided equally between positive (‘You get an Intermediate measures of attributional style ‘‘A’’ on a test’) and negative events (‘You break a address this problem by limiting the situations glass’). The scale uses both a forced choice and about which an explanatory style is being a dimensional approach. Respondents select assessed. Increased specificity should increase between two possible causes for the event, and the ability of such measures to predict actual each option represents the presence or absence of attributions. The ASAT’s emphasis on four one attribution dimension (for example, an situation types (success/failure by interpersonal/ internal or external cause). Attributions for each noninterpersonal) is one approach to increasing dimension are computed by summing the number specificity. Other research on this issue has been of internal, stable, or global responses. Scores mixed, however (Henry & Campbell, 1995), similar to the ASQ can then be computed. suggesting that further work is needed to Internal consistency of the CASQ is low to establish the appropriate level of specificity in adequate and improves when the separate attributional style measures. dimensions are combined into a single composite. Academic Settings Content Analysis Measure Two measures have been used to assess attribu- The Content Analysis of Verbatim Explanations tional style in academic settings. The Academic (CAVE; Peterson, 1992) technique assesses Attributional Style Questionnaire (AASQ; Peterson attributional style through a content analysis of & Barett, 1987) uses the same format as the ASQ an individual’s writing. This allows analysis of and contains descriptions of 12 negative events that ecologically valid events without requiring the occur in academic settings. The measure has participant to complete a questionnaire. The demonstrated high internal consistency, and find- CAVE can also be applied to historical data, and ings suggest that students who make internal, it has established the stability of attributional stable, and global attributions for negative events style over a 52-year period (Burns & Seligman, tend to do more poorly in classes. Henry and 1989). Coders first extract causal explanations Campbell (1995) also developed a measure of from a text, then rate them along the dimensions attributional style for academic events. Their of locus, stability, and globality. Inter-rater measure contains 20 items, equally divided reliability for the CAVE technique is satisfactory, between positive and negative events. The measure and internal consistency has been reported as low displayed adequate to good reliability and also to adequate. More standard questionnaire mea- predicted academic performance. sures of attributional style may be better predictors of depression, but the CAVE technique Work Settings has proven useful when written content is all that is available. The Organizational Attributional Style Questionnaire (OASQ; Kent & Martinko, 1995) was developed to assess attributional style for INTERMEDIATE MEASURES OF negative events in a work setting. The format is ATTRIBUTIONAL STYLE similar to that of the ASQ, and the measure contains descriptions of 16 negative events that Global measures of attributional style assume a can occur in a work setting. After writing down high degree of cross-situational consistency in the an explanation for the event, respondents rate the

[8.8.2002–12:29pm] [1–128] [Page No. 118] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Attribution Styles 119

explanation along the dimensions of internal Using Attributional Style Measures locus, external locus, stability, controllability, globality, and intentionality. The internal con- There are numerous ways of measuring attribu- sistency for the scale is moderate to good. tional style, each with particular strengths and weaknesses. In deciding which scale to use, the researcher needs to carefully consider the specific Relationships goals of the research project, and then pick the tool that best meets the needs of that project. The Several different types of intermediate attribu- modest reliabilities of these scales suggests that tional style measures have been developed for considerable attention be paid to sample size and measuring attributions in the context of relation- power. ships. The Relationship Attribution Measure (RAM; Bradbury & Fincham, 1990) assesses the types of attributions people make for a Notes spouse’s negative behaviour. Respondents read a hypothetical negative action by their partner and 1 The various ASAT scales, as well as Chinese rate the causes of that event along six dimen- versions of that ASAT-I, the Beck Depression sions: locus, stability, globality, and responsibility Inventory, and the Revised UCLA loneliness (intent, selfishness, and blame). Researchers can scales, can be downloaded from the following use either a four- or eight- item version. A web site: psych-server.iastate.edu/faculty/caa/ composite of all attributional dimensions displays Scales/Scales.html high internal consistency and predicts marital satisfaction. Partners who attribute negative References partner behaviour to internal, stable, and global causes are more likely to be dissatisfied with the Abramson, L.Y., Seligman, M.E.P. & Teasdale, J. relationship. Fincham has also developed a (1978). Learned helplessness in humans: Critique version of the RAM for use with children to and reformulation. Journal of Abnormal Psychol- assess attributions for parent–child interactions. ogy, 87, 49–74. The Children’s Relationship Attribution Measure Anderson, C.A. (1999). Attributional style, depression, and loneliness: a cross-cultural comparison of (CRAM; Fincham, Beach, Arias, & Brody, 1998) American and Chinese students. Personality and uses a format similar to the RAM, and contains Social Psychology Bulletin, 25, 482–499. descriptions of two negative events. Anderson, C.A., & Riger, A.L. (1991). A controll- ability attributional model of problems in living: dimensional and situational interactions in the prediction of depression and loneliness. Social CONCLUSIONS Cognition, 9, 149–181. Bradbury, T.N., & Fincham, F.D. (1990). Attributions Future Research in marriage: review and critique. Psychological Bulletin, 107, 3–33. Measures of attributional style have generated Burns, M.O. & Seligman, M.E.P. (1989). Explanatory several issues which require additional research. style across the life span: evidence for stability over 52 years. Journal of Personality and Social The first issue involves level of specificity. Many Psychology, 56, 471–477. studies question the presence of a global Carver, C.S. (1989). How should multifaceted person- attributional style, and it is not clear if ality constructs be tested? Issues illustrated by self- intermediate measures provide a satisfying solu- monitoring, attributional style, and hardiness. tion to this problem. Additional research is Journal of Personality and Social Psychology, 56, 577–585. needed to resolve these issues. Furthermore, Cutrona, C.E., Russell, D. & Jones, R.D. (1985). attributional style measures typically suffer from Cross-situational consistency in causal attribu- poor reliability. New measures need to be tions: does attributional style exist? Journal of developed to address this shortcoming. Finally, Personality and Social Psychology, 47, 1043–1058. more research is needed on the controllability Deuser, W.E. & Anderson, C.A. (1995). Controllability dimension of attributional style and on the attributions and learned helplessness: some metho- unique contributions of the various attributional dological and conceptual problems. Basic and dimensions. Applied Social Psychology, 16, 297–318.

[8.8.2002–12:29pm] [1–128] [Page No. 119] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 120 Autobiography

Feather, N.T. & Tiggemann, M. (1984). A balanced Peterson, C., Semmel, A., Von Baeyer, C., Abramson, measure of attributional style. Australian Journal of L., Metalsky, G.I. & Seligman, M.E.P. (1982). The Psychology, 36, 267–283. attributional style questionnaire. Cognitive Therapy Fincham, F.D., Beach, S.R.H., Arias, I. & Brody, G.H. and Research, 3, 287–300. (1998). Children’s attributions in the family: the Peterson, C. & Villanova, P. (1988). An expanded children’s relationship attribution measure. Journal attributional style questionnaire. Journal of Abnor- of Family Psychology, 12, 481–493. mal Psychology, 97, 87–89. Henry, J.W. & Campbell, C. (1995). A comparison of Seligman, M.P. et al. (1984). Attributional style and the validity, predictiveness, and consistency of a trait depressive symptoms among children. Journal of versus situational measure of attributions. In Abnormal Psychology, 93, 235–238. Marinko, M.J. (Ed.), Attribution Theory: An Sweeney, P., Anderson, K. & Bailey, S. (1986). Organizational Perspective (pp. 35–52). Delray Attributional style in depression: a meta-analytic Beach, FL: St. Lucie Press. review. Journal of Personality and Social Psychol- Kent, R. & Martinko, M. (1995). The development ogy, 50, 974–991. and evaluation of a scale to measure organizational Xenikou, A., Furnham, A. & McCarrey, M. (1997). attributional style. In Martinko, M. (Ed.), Attribu- Attributional style for negative events: A proposition tion Theory: An Organizational Perspective for a more reliable and valid measure of attribu- (pp. 53–75). Delray Beach, FL: St. Lucie Press. tional style. British Journal of Psychology, 88, Norman, P.D. & Antaki, C. (1988). Real events 53–69. attributional style questionnaire. Journal of Social and Clinical Psychology, 7, 97–100. Robert M. Hessling, Craig A. Anderson and Peterson, C. (1992). Explanatory style. In Smith, Daniel W. Russell Charles P. & Atkinson, John W. (Eds.), Motivation and Personality: Handbook of Thematic Content Analysis. (pp. 376–382). New York: Cambridge University Press. Related Entries Peterson, C. & Barrett, L. (1987). Explanatory style and academic performance among university Personality (General), Cognitive Styles, Moti- freshman. Journal of Personality and Social Psy- vation, Irrational Beliefs. chology, 53, 603–607.

A AUTOBIOGRAPHY

INTRODUCTION person him or herself. It is a self-report by which a person expresses, explains, or explores his or Autobiography constitutes a critical resource for her subjective experience over time. It thus psychological assessment and yet a complex represents a route to what it means and feels challenge to it. The essence of this challenge lies like to be that person, on the inside. Such a in the fact that autobiography can be seen as definition distinguishes immediately between both a focus of assessment and a means of autobiography and biography (an account of a conducting it. Since autobiography does not lend life, presumably with greater objectivity, by itself to assessment by instruments or scales, the someone else.) An equivalent term for autobio- sections in this entry will focus on general issues graphy would be life story. This can in turn be associated with the defining, assessing, and distinguished from life history, or indeed case researching of autobiography, as well as on history, which is an account of a life for specific future developments concerning it. purposes by, for example, a social worker or physician. Starting from this basic definition, autobiogra- DEFINING AUTOBIOGRAPHY phy can be categorized according to whether it is formal or informal. Though the distinction can be a Autobiography is a narrative accounting of a fine one, formal autobiography means a deliberate person’s life as interpreted or articulated by the and comparatively structured recounting of one’s

[8.8.2002–12:29pm] [1–128] [Page No. 120] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Autobiography 121

life with the express intention of summing it up to evidence of self-deception or of specific disorders; date or making a public statement concerning it. and/or locus of control. While the expression may take many forms, Within , the focus including poetry and sculpture, obvious examples may be on one’s interpretation of life events; on range from a published memoir to a curriculum one’s life-course trajectory; on the evolution of vita. Informal autobiography includes what one personal identity (McAdams, 1988); on guiding reveals about oneself in less intentional ways, personal metaphors; on the relationship between through one’s speech, as in conversation or life story and values or emotions; and on changes therapy, one’s words, as in letters or diaries, or over time in the content and form of one’s self- one’s gestures and deeds. Behind both formal and report – or ‘the development of autobiography’ informal autobiography lies one’s autobiographi- (Bruner, 1987). Within social psychology, sociol- cal memory, or the memory one has of one’s life as ogy, and anthropology, the focus of assessment a whole (Rubin, 1996). However, insofar as such may be on the social constructedness of the self memory is internal to a person, assessments of its and on how ‘narrative practice’ (Holstein and structure and possible impairments are impossible Gubrium, 2000) concerning the self is portrayed except as it is mediated by that person’s actions or and utilized. As conventions of self-talk and self- words. In this entry, then, ‘autobiography’ means representation, or ‘forms of self-telling’ (Bruner, any autobiographical activity that has some mode 1987), can vary profoundly by culture, language, of external expression. gender, ethnicity, and class, they are necessarily Additional distinctions by which autobiogra- of major concern in assessing differences in the phy can be categorized – and assessed – are accounts that individuals give of their lives. whether it is voluntary (spontaneous, self- Within cognitive science, the aim of assessment directed) or involuntary (requested, assigned); may be on the formation and function of one’s intended for a public audience or for private autobiographical memory and on its complete- reflection; partial (concerning a particular period ness, reliability, and accuracy – that is, the or theme in one’s life) or complete (concerning interplay between fact and fiction within auto- one’s life as a whole); superficial or in-depth; and biographical memory (Rubin, 1996), or between whether the cue prompting it is specific or ‘historical truth’ and ‘narrative truth’ (Spence, general (for example, What was it like growing 1982). up blind? or simply Tell me about your life). Within a healthcare context, autobiographical activity can convey invaluable information con- cerning a patient’s medical history, social net- ASSESSING AUTOBIOGRAPHY works and relationships, living conditions, and overall emotional and cognitive status. It can also What is assessed from autobiographical activity, provide a reference point for assessing differences the method or instrument by which the assess- between subjective and objective measures of ment is carried out, and the theoretical perspec- physical health; and can assist in the detection tive(s) in which the assessment is rooted, depend and diagnosis of particular pyschopathologies, on the discipline or context that is involved. including dementia. Within the context of psychology, the most Within the humanities, and specifically literary obvious example of this point is in relation to criticism, assessment of autobiographical activity psychotherapy, and not least to the field of may draw upon psychological or psychoanalytic psychoanalysis. While the assessment and inter- theory to focus on the various functions, personal pretation of autobiography constitute an integral and social, that autobiography serves for the source of information about an individual and person who engages in it (LeJeune, 1989). In about possible issues or themes on which the addition, it can focus on the narrative structure analysis can focus, the focus itself depends on the and integrity of particular autobiographical texts therapeutic perspective that is employed. in terms of, for instance, plot, genre, theme, Accordingly, it may be on, for example, a metaphor, point of view, and voice; on the role of person’s self-concept; degree of introversion– language, and thus culture, in the formation and extroversion; obvious omissions from the per- development of self-awareness and subjectivity; son’s self-report and their possible significance; on the complex inter-relationships between

[8.8.2002–12:29pm] [1–128] [Page No. 121] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 122 Autobiography

author, text, context, and audience (Olney, RESEARCHING AUTOBIOGRAPHY 1980); and on the philosophical and hermeneu- tical significance of being, at once, composer, From a research perspective, it would be valuable narrator, editor, character, and reader in relation to examine the development of autobiography to one’s own life story (Randall, 1995). using qualitative methods within a longitudinal Finally, within gerontology, the study and design. Of course, the very nature of autobio- assessment of autobiographical activity has graphy leads us to treat it as ‘longitudinal’, since perhaps a special significance insofar as gerontol- it provides a good characterization of how a ogy is concerned with social and psychological person perceives his or her past in light of what development across the lifespan. Accordingly, the life is like today and is expected to be like focus may overlap with that used in other tomorrow, or in the future. However, such data disciplines and be on, for example, an indivi- represents not the past as it was at the time it dual’s subjective experience of the ageing process, occurred – not the ‘true story’ – but the past as or biographical ageing; on the question of perceived at the time it is recounted, and as competence and of the relationship between portrayed to a particular audience. Of central person and environment (Svensson, 1996); and interest in research on autobiography, then, on the role played by autobiographical activity in would be how people’s perception of their lives relation to life review, generativity, spirituality, change, or remain stable, as they age, and what and preparing for death. changes occur in both the selection of events that One particular method that uses autobiography they recount and the angle or tone from which in working with older adults – as a means not those events are interpreted and told. only of assessment but also of education, One possible design is to ask people at age 60, recreation, and (informal) therapy – is called for example, to tell about their lives at 60, at age ‘guided autobiography’ (Birren and Deutchman, 70 to tell about life at 70, and so on. This would 1991). In guided autobiography, persons write enable an assessment of the degree of change or about their lives in relation to set themes – such stability in the content of their autobiographies as as career, family, money, health, and love – and they grow older. Similarly, asking people at 70 to then share their writings with other individuals in tell about life at 60, and at 80 to tell about life at a group setting. Such groups have been shown to 70 (and 60), would permit an assessment of be successful for those involved in increasing change and stability in people’s perspectives on their sense of self-understanding and of personal both their age and the ageing process. Finally, integration. having people at 60 tell about their entire In general, autobiographical activity in an lifespan, at 70 the same, and so on, would advanced age can be assessed and utilized in provide a picture of the relative change and terms of numerous functions that it can be said to stability in their perspectives on the content and serve: significance of their lives as a whole. Overall, . identifying and honouring key turning- such a design would permit a better under- points during one’s life-course standing of how people perceive, represent, and . coming to grips with past resentments and interpret their lives at different stages. negative feelings . setting the record straight FUTURE PERSPECTIVES . finding meaning amid life’s struggles and challenges In the future, due to rapid social change, there . seeking answers to personal issues will probably be a more pronounced need and . reviewing one’s life to attain a sense of peace use of autobiography as a means for individuals . leaving a unique legacy of experience and to evaluate, understand, and integrate their lives, wisdom if not as a continuous process, then at different It should be noted, though, that autobiographical intervals over the lifespan. From a research activity can serve many of the above functions at perspective, there will most probably be a greater any point throughout the lifespan, and not only focus on using autobiographical data in long- in later life. itudinal studies, especially of older persons, to

[8.8.2002–12:29pm] [1–128] [Page No. 122] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Automated Test Assembly Systems 123

gain a sense of change and stability in their inner McAdams, D. (1988). Power, Intimacy, and the Life experiences of the ageing process. Story: Personological Inquiries into Identity. New York: Guilford. Though it presents many issues for considera- Olney, J. (Ed.) (1980). Autobiography: Essays Theore- tion, autobiography constitutes a valuable tool in tical and Critical. Princeton, NJ: Princeton Uni- several disciplines for assessing people’s percep- versity Press. tions of their lives. In many ways, however, it has Randall, W. (1995). The Stories We Are: An Essay on not yet been fully exploited as a qualitative Self-Creation. Toronto: University of Toronto Press. Rubin, D. (Ed.) (1996). Remembering Our Past: method, especially in longitudinal research. As a Studies in Autobiographical Memory. New York: complement to various tests and measures, it Cambridge University Press. merits greater use in order to provide a fuller Spence, D. (1982). Narrative Truth and Historical description and a richer understanding of the Truth. New York: W.W. Norton. process of human life. Svensson, T. (1996). Competence and quality of life: theoretical views of biography. In Birren, J.E., Kenyon, G.M., Ruth, J.-E., Schroots, J.J.F. & Svensson, T. (Eds.), Aging and Biography: Explora- References tions in Adult Development (pp. 100–116). New York: Springer. Birren, J.E. & Deutchman, D. (1991). Guiding Autobiography Groups for Older Adults: Exploring Torbjo¨ rn Svensson and William Randall the Fabric of Life. Baltimore, MD: The Johns Hopkins University Press. Bruner, J. (1987). Life as narrative. Social Research. Related Entries 54(1), 11–32. Holstein, J. & Gubrium, J. (2000). The Self We Live By: Narrative Identity in a Postmodern World. New Qualitative Methods, Theoretical Perspective: York: Oxford University Press. Constructivist, Self-Presentation Measurement, LeJeune, P. (1989). On Autobiography. (trans. K. Subjective Methods, Self, The (General) Leary) Minneapolis, MN: University of Minnesota Press.

AUTOMATED TEST ASSEMBLY A SYSTEMS

INTRODUCTION function is maximized subject to a set of constraints, both typically modelled using 0–1 Historically, test construction in education and decision variables for the inclusion of the items in psychology has shown a development from: (1) the test. Currently, a large variety of test assembly the construction of standardized tests to the problems have been modelled this way and practice of assembling tests from item banks powerful algorithms for solving them are available. tailored to the test assembler’s specifications; (2) the use of intuitive rules of test construction to the application of model-based algorithms; and MODELLING TEST ASSEMBLY (3) manual sorting of items on index cards to PROBLEMS selection by a computerized system. Test assembly can be characterized as the task of A common view underlying all attempts to finding a combination of items from an item pool automate test assembly is to see each item in the that satisfies a list of content specifications and is pool as a carrier of a set of attributes relevant to the optimal in a statistical sense. Formally, the problem psychological variable or the domain of knowledge has the structure of a constrained combinatorial or skills the pool is designed to measure. A formal optimization problem in which an objective distinction can be made between the following

[8.8.2002–12:29pm] [1–128] [Page No. 123] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 124 Automated Test Assembly Systems

types of attributes: 2 Number of items on application equal to 20; 1 Categorical attributes, such as item content, 3 All items four response alternatives; cognitive level, format, answer key, and item 4 Number of items with graphics at author. This type of attribute implies least 10; a discrete classification of the pool, that is, 5 Total number of items equal to 50; a partition with classes of items containing 6 No items with more than 150 words; the same attribute. 7 All item difficulties larger than 0.40; 2 Quantitative attributes, such as item param- 8 All item difficulties smaller than 0.60; eter estimates, expected response time, pre- 9 All item discrimination indices larger vious exposure rate, and word counts. This than 0.30; type of attribute is a value on a variable or 10 Item 73 and 98 not together in the test. parameter that, for all practical purposes, is to be considered as continuous. When translating test specifications into con- 3 Logical attributes, which imply relations straints, each constraint is required to have a among subsets of items in the pool, mostly simple form. For example, though it seems relations of inclusion or exclusion. A rela- convenient to combine Constraint 7 and 8 into a tion of inclusions exists if an item has to be ‘single’ constraint (‘All item difficulties between presented with other items in the pool 0.40–0.60’), such a step would obscure the total because they share a stem or the description number of constraints actually involved in the of a case. A relation of exclusion exists if problem. Also, for each problem only one items cannot be in the same test form, for objective function can be optimized at a time. instance, because some of them clue the If we have more functions, optimizing one of correct answer to the others. them automatically gives a suboptimal solution In addition to item attributes, it is useful to intro- for the others. Finally, exchanging objective duce the notion of test attributes. A test attribute is functions and constraints does not sometimes defined as a (function on the) distribution of item have too much effect. For example, we can attributes (van der Linden, 2000a). Examples of replace the objective function in the above test attributes are: the distribution of item content example by one in which the test is constrained or p-values in a test, its information function, the to have reliability close to an educated guess of number of items with a gender orientation, and its its optimum value and replace Constraint 7 and (classical) reliability. A test can now be defined as a 8 by an objective function that minimizes the set of items from a pool that meets a list of distances between the item difficulties and a specifications with respect to its attributes. target value of 0.50. In large-scale testing An important distinction is between test programs, test assembly problems in a standard specifications formulated as constraints and as format can easily have more than 200 con- objective functions: straints. For a more complete introduction to item and test attributes, test specifications, and 1 A specification is a constraint if it requires a rules for translating specifications into objective test attribute to meet an upper limit, lower functions and constraints, see van der Linden (in limit, or equality. preparation; Chapter 2) 2 A specification is an objective function if it A mathematical solution to test assembly requires a test attribute to take a minimum problems becomes possible if the objective or maximum value. function and constraints are modelled using The standard format of a test assembly problem variables for the decision to select the items in is illustrated by the following example of a the test. Let index i ¼ 1, ..., I denote the items in classical test assembly problem: the pool. The most commonly used decision variables are binary variables x , where x =1 Maximize test reliability i i denotes the selection of item i and x =0 subject to i otherwise. (Other types of variables are some- 1 Number of items on knowledge of facts times necessary though; see (Section Some smaller than 15; Applications).

[8.8.2002–12:29pm] [1–128] [Page No. 124] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Automated Test Assembly Systems 125

A few examples of constraints modelled in values of the decision variables using one of the terms of decision variables are: algorithms available from the literature. Although 0–1 LP problems are known to be NP-hard, that 1 Constraint 2 in the above example is a is, to have solutions that cannot generally be constraint with respect to a categorical found in a time bounded by a polynomial in the attribute. If V denotes the set of indices of a size of the problem, current technology has the items with the attribute Application, the reached a level of sophistication that allows us to constraint can be modelled as: find exact solutions to problems with 1000–2000 X variables and hundreds of constraints within xi ¼ 20: ð1Þ seconds. Sometimes, test assembly models have i2Va the special structure of a network-flow program- ming problem. For such structures solutions to 2 Constraint 7 is an example of a constraint problems of virtually unlimited size can be with respect to a quantitative attribute. If pi calculated within a second (for examples, see denotes the p-value of item i, it can be Armstrong, Jones, & Wang, 1995). A very modelled as: efficient general-purpose LP software package is CPLEX 6.5 (ILOG, 2000). A dedicated software pixi 1, i ¼ 1, ..., I: ð2Þ package that helps test assemblers to define their problem and then translates the problem into an 3 Constraint 10 is a logical constraint. It can LP model is ConTEST (Timminga, van der be modelled as: Linden & Schweizer, 1996). An alternative to model-based test assembly is x73 þ x98 1: ð3Þ test assembly based on a heuristic. Test assembly heuristics are computer algorithms that assemble All these constraints are linear equalities or a test in a sequential fashion, that is, by selecting inequalities in the decision variables. The feature one item at a time. They do so using an item- holds nearly universally for all test specifications selection criterion designed to meet the test used in practice. A simple recipe to check if specifications. Because of their sequential nature, constraints are modelled correctly is to substitute heuristics are generally fast. However, steps early trial values for the decision variables and in the sequential process cannot be undone later, determine the truth-value of the constraint. and heuristics produce solutions that are not Examples of objective functions modelled in optimal. Another difference between the two terms of decision variables are given in the approaches becomes manifest if a new class of section on Applications, below. test assembly problems has to be addressed. In an LP approach, the problem only has to be modelled and the model can be solved immedi- SOLVING TEST ASSEMBLY ately by the algorithms and the software already PROBLEMS available, whereas in a heuristic approach a new item-selection criterion and computer algorithm Mathematical optimization problems with a have to be developed and checked for the quality linear objective function and linear constraints of their solutions. Examples of test assembly belong to the domain of Linear Programming heuristics proven to be useful are given in Luecht (LP). The first to see the applicability of LP to (1998) and Swanson and Stocking (1993). test assembly were Feuerman and Weiss (1973) and Votaw (1952). If the decision variables are binary, the problem is known as a 0–1 LP SOME APPLICATIONS problem. For a general introduction to these optimization techniques, see Nemhauser and Target Information Function Wolsey (1988) or Wagner (1972). Once a test assembly problem has been The practice to assemble a test to meet a target modelled as a 0–1 LP problem, a solution can for its information function was introduced in easily be found by solving the model for optimal Birnbaum’s (1968) pioneering work on

[8.8.2002–12:29pm] [1–128] [Page No. 125] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 126 Automated Test Assembly Systems

IRT-based test assembly. Theunissen (1985) was 16.0 the first to realize that the problem can be solved using 0–1 LP, provided the information function 12.0 is required to meet the target, TðÞ), only in a 8.0 series of discrete points, k, k ¼ 1, ..., K. Uni- form approximation of the test information function to a series of target values is possible 4.0 Test Information through a maximin approach (van der Linden & Boekkooi-Timminga, 1989). In this approach, 0.0 test information is required to be in intervals −3.0 −2.0 −1.0 0.0 1.0 2.0 3.0 θ about the target values, ðTðkÞþy, TðkÞyÞ, and the objective function minimizes the common size of the intervals. Formally, the model is Figure 1. Information function for test form assembled from an LSAT pool (solid line represents minimize y ð4Þ target) subject to parallel forms. The best result is obtained if such XI sets are assembled simultaneously. If they are IiðkÞxi TðkÞy, k ¼ 1, ..., K, ð5Þ assembled sequentially, the value of the objective i¼1 functions of each next form can be expected to be worse than those of its predecessors. XI Multiple test forms can be assembled simulta- IiðkÞxi TðkÞy, k ¼ 1, ...,K, neously if the following modifications are i¼1 introduced: ð6Þ 1 The decision variables are replaced by variables x , with value 1 if item i is assigned where y is a real-valued decision variable with if to form f ¼ 1, ..., F and value 0 otherwise. optimal value to be calculated by the algorithm. 2 Constraints are added to the model to (LP problems with both integer and real-valued guarantee that each item is assigned to no variables are known as mixed integer program- more than one form: ming problems.) Of course, these equations should be extended with a set of constraints to meet the content specifications for the test. XF An empirical example for a pool of 753 xif 1, i ¼ 1, ..., I ð7Þ items from the Law School Admission Test f ¼1 (LSAT) is given in Figure 1. The test length was set at 75 items. (The actual LSAT is longer For the same LSAT item pool, Figure 2 shows because it duplicates one of its sections.) In all, the information functions of three parallel a 0–1 LP model with 804 variables and 276 forms assembled to meet the same target as in constraints was needed to assemble the test to Figure 1. For more on this application as well as deal with all specifications (including an item-set methods to deal with large multiple-form structure of some of the sections; see Section assembly problems, see van der Linden and Tests with Itemsets). The test information Adema (1998). function had to approximate the target at five values. Figure 1 shows both the informa- Tests with Item Sets tion function of the test assembled and the full target. Tests with item sets are popular because they allow for the testing of knowledge or skills using Multiple Test Forms the same case for more than one item. Often, the item pool has more items per set than needed in If examinees are allowed to take tests at different the test. Let s ¼ 1, ..., S denote the item sets in the sessions, tests are often assembled as sets of pool, is ¼ 1, ..., Is the items in set s, and ns the

[8.8.2002–12:29pm] [1–128] [Page No. 126] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword Automated Test Assembly Systems 127

16.0 testlets for testlet-based computerized adaptive testing (CAT); (5) assembly of tests with 12.0 observed scores equated to those on a previous version of the test. A recent review of these and 8.0 other applications is given in van der Linden (1998; in preparation). 4.0 Test Information

0.0 -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 FUTURE PERSPECTIVES θ Though the development of computerized adaptive Figure 2. Information functions for three parallel testing (CAT) was mainly motivated by statistical test forms assembled from an LSAT pool (same target considerations, real-life CAT systems have to meet as in Figure 1). a host of nonstatistical specifications as well. A recent development is the use 0–1 LP test assembly number of items required from set s if it is to introduce nonstatistical constraints in CAT (van selected to be in the test. der Linden, 2000b). The technique is applied Tests with item sets can be assembled if the through the assembly of a shadow test prior to the following modifications are introduced: selection of the next item for an examinee. Shadow 1 In addition to decision variable for the tests are tests that: (1) contain all items already assembled; (2) meet all constraints that have to be items, 0–1 variables zs for the selection of set s are introduced. imposed on the adaptive test; and (3) have 2 Constraints are added to the model that maximum information at the last update of the both coordinate the selection of item and ability estimator. The item actually administered is sets and guarantee the correct number of the most informative item in the shadow test not items in each selected set: administered to the examinee yet. Because after each update of the ability estimate the shadow test is re-assembled, the adaptive test is maximally XIs informative. An addition, because each shadow test xis nsZs ¼ 0, s ¼ 1, ..., S: ð8Þ is meets all necessary constraints, the adaptive test does. The LSAT form assembled for Figure 1 had an Even though automated test assembly guarantees item-set structure for some of its sections. For the best test from the pool, the result may be of low other empirical examples and approaches to quality if the item pool is poor. In the parlance of 0– assembling tests with item sets, see van der 1 LP test assembly, the most important constraint Linden (2000a). imposed on the assembly of the test may be the poor composition of the item pool. It is therefore expected that an important future activity will be Other Applications the development of methods to design item pools better targeted towards the tests to be assembled The above applications illustrate only a few of from them. A first attempt at optimal item pool the options made possible by 0–1 LP test design is given in van der Linden, Veldkamp and assembly. Other options include: (1) classical Reese (2000). A key notion in their approach is the test assembly, with Cronbach’s alpha represented one of a design space for the item pool. This space is by a combination of an objective function and a defined as the Cartesian product of all statistical constraint; (2) assembly of tests required to and nonstatistical item attributes involved in the match a given test form item by item; (3) specifications for the tests from the pool. (This assembly of tests measuring a multidimensional operation may require discretization of quantita- ability; (4) assembly of multiple test forms that tive attributes.) A point in this space identifies a differ systematically, for example, a set of possible item in the pool. The technique of integer subtests for a multi-stage testing system or programming is then used to calculate an optimal

[8.8.2002–12:29pm] [1–128] [Page No. 127] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword 128 Automated Test Assembly Systems

blueprint of the item pool from the specifications problems. Applied Psychological Measurement, 17, for the tests the pool has to serve. The blueprint 151–166. Theunissen, T.J.J.M. (1985). Binary programming and specifies the optimal number of items required for test design. Psychometrika, 50, 411–420. each point in the design space. Timminga, E., van der Linden, W.J. & Schweizer, D.A. (1997). ConTEST 2.0 Modules: A Decision Support System for Item Banking and Optimal Test CONCLUSION Assembly [Computer Program and Manual]. Gro- ningen, The Netherlands: iec ProGAMMA. van der Linden, W.J. (1998). Optimal assembly of Over the last decade several models and algorithms educational and psychological tests, with a biblio- for automated test assembly have been developed. graphy. Applied Psychological Measurement, 22, Automated assembly is now possible for almost 195–211. every type of test and every set of specifications. van der Linden, W.J. (2000a). Optimal assembling of tests with item sets. Applied Psychological Measure- This development seems timely because automated ment, 24, 225–240. test assembly is the key to any form of computer- van der Linden, W.J. (2000b) Constrained adaptive based testing and the current expectations about testing with shadow tests. In van der Linden, W.J. & the improvements in the practice of testing that Glas, C.A.W. (Eds.), Computerized Adaptive Test- have become possible by the introduction of ing: Theory and Practice (pp. 27–52). Norwell, MA: Kluwer Academic Publishers. computers in testing are high. van der Linden, W.J. (in preparation). Linear Models for Optimal Test Design. New York: Springer- Verlag. References van der Linden, W.J. & Adema, J.J. (1998). Simultaneous assembly of multiple test forms. Armstrong R.D., Jones, D.H. & Wang, Z. (1995). Journal of Educational Measurement, 35, 185–198. Network optimization in constrained standardized van der Linden, W.J. & Boekkooi-Timminga, E. test construction. In Lawrence, K.D. (Ed.), Applica- (1989). A maximin model for test design with tions of Management Science: Network Optimiza- practical constraints. Psychometrika, 54, 237–247. tion Applications, Vol. 8. (pp. 189–212). Greenwich, van der Linden, W.J., Veldkamp, B.P. & Reese, L.M. CT: JAI Press. (2000). An integer programming approach to item Birnbaum, A. (1986). Some latent trait models and pool design. Applied Psychological Measurement, their use in inferring an examinee’s ability. In Lord, 24, 139–150. F.M. & Novick, M.R. (Eds.), Statistical Theories of Votaw, D.F. (1952). Methods of solving some Mental Test Scores. Reading, MA: Addison-Wesley. personnel classification problems. Psychometrika, Feuerman, F. & Weiss, H. (1973). A mathematical 17, 255–266. programming model for test construction and Wagner, H.M. (1972). Principles of Operations scoring. Management Science, 19, 961–966. Research, with Applications to Managerial Deci- ILOG, Inc. (2000). CPLEX 6.5 [Computer program sions. London: Prentice-Hall. and manual]. Incline Village, NV: Author. Luecht, R.M. (1998). Computer-assisted test assembly Wim van der Linden using optimization heuristics. Applied Psychological Measurement, 22, 224–236. Nemhauser, G. & Wolsey, L. (1988). Integer and Combinatorial Optimization. New York: Wiley. Related Entries Swanson, L. & Stocking, M.L. (1993). A model and heuristic for solving very large item selection Item Response Theory

[8.8.2002–12:29pm] [1–128] [Page No. 128] FIRST PROOFS {Books}Ballesteros/Ballesteros-A.3d Paper: Ballesteros-A Keyword