Industrial and Organizational Psychology, 1 (2008), 333–342. Copyright ª 2008 Society for Industrial and Organizational Psychology. 1754-9426/08

FOCAL ARTICLE Stubborn Reliance on Intuition and Subjectivity in Employee Selection

SCOTT HIGHHOUSE Bowling Green State University

Abstract The focus of this article is on implicit beliefs that inhibit adoption of selection decision aids (e.g., paper-and-pencil tests, structured interviews, mechanical combination of predictors). these beliefs is just as impor- tant as understanding organizational constraints to the adoption of selection technologies and may be more useful for informing the design of successful interventions. One of these is the implicit that it is theoretically possible to achieve near-perfect precision in predicting performance on the job. That is, people have an inherent resistance to analytical approaches to selection because they fail to view selection as probabilistic and subject to error. Another is the implicit belief that of human behavior is improved through experience. This myth of expertise results in an overreliance on intuition and a reluctance to undermine one’s own credibility by using a selection decision aid.

Perhaps the greatest technological achieve- unstructured interviews. For example, the ment in industrial and organizational (I–O) right side of Figure 1 shows the results of psychology over the past 100 years is the a meta- conducted on the actual development of decision aids (e.g., paper- effectiveness of these same procedures for and-pencil tests, structured interviews, predicting performance in sales (Vinchur mechanical combination of predictors) that Schippmann, Switzer, & Roth, 1998). Use of substantially reduce error in the predic- any one of the paper-and-pencil tests alone tion of employee performance (Schmidt & outperforms the unstructured interview—a Hunter, 1998). Arguably, the greatest failure procedure that is presumed to assess ability, of I–O psychology has been the inability to personality, and aptitude concurrently. convince employers to use them. A little over Although one might argue that these data 10 years ago, Terpstra (1996) sampled 201 merely reflect a lack of about human resources (HR) executives about the effective practice, there is considerable evi- perceived effectiveness of various selection dence that employers simply do not believe methods. As the left side of Figure 1 shows, that the research is relevant to their own sit- they considered the traditional unstructured uation (Colbert, Rynes, & Brown, 2005; interview more effective than any of the Johns, 1993; Muchinsky, 2004; Terpstra & paper-and-pencil assessment procedures. Rozelle, 1997; Whyte & Latham, 1997). Inspection of actual effectiveness of these For example, Rynes, Colbert, and Brown procedures, however, shows that paper- (2002) found that HR professionals were and-pencil tests commonly outperform well aware of the limitations of the unstruc- tured interview. Similarly, one of my stu- dents conducted a yet-unpublished survey Correspondence concerning this article should be of HR professionals (n ¼ 206) about their addressed to Scott Highhouse. E-mail: [email protected] Address: Bowling Green State University, Bowling views of selection practice. His data indi- Green, OH 43403. cated that the HR professionals agreed, by 333 334 S. Highhouse

Unstructured Unstructured Interview Interview

Specific Specific AptitudeTest AptitudeTest

Personality Personality Test Test

GMA Test GMA Test

12345 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Perceived Effectiveness Actual Effectiveness (Sales)

Figure 1. Perceived versus actual usefulness of various predictors. Note. Perceived effectiveness numbers are on a 1–5 scale (1 ¼ not good;3¼ average;5¼ extremely good). Actual effectiveness numbers are correlations corrected for unreliability in the criterion and range restriction. Because Vinchur, Schippmann, Switzer, and Roth (1998) did not include interviews, the interview estimate is from Huffcutt and Arthur (1994) level 1 interview. GMA ¼ general mental ability; personality ¼ potency; specific aptitude ¼ sales ability. a factor of more than 3 to 1, that using tests Malcolm Gladwell’s (2005) Blink: The was an effective way to evaluate a candi- Power of Thinking Without Thinking and date’s suitability and that tests that assess Gerd Gigerenzer’s (2007) Gut : specific traits are effective for hiring em- The of the Unconscious, which ployees. At the same time, however, these extol the virtues of intuitive decision mak- same professionals agreed, by more than 3 ing. Although the assertions of these authors to 1, that you can learn more from an infor- have little relevance for the prediction of mal discussion with job candidates and that human performance, the popularity of their you can ‘‘read between the lines’’ to detect work likely reinforces the common belief whether someone is suitable to hire. This that good hiring is a matter of experience apparent conflict between knowledge and and intuition. belief seems loosely analogous to the com- mon practice of preferring brand name cold Implicit Beliefs remedies to store brand remedies containing the same ingredients. People know that the My colleagues and I (Lievens, Highhouse, & store brands are identical, but they do not De Corte, 2005) conducted a policy-capturing trust them for their own colds. study of the decision processes of retail man- Some might argue that the tide is turning. agers making hypothetical hiring decisions. Much has been written on the merits of evi- We found that the managers placed more dence-based management (Pfeffer & Sutton, emphasis on competencies assessed by 2006; Rousseau, 2006). This approach, unstructured interviews than on competen- much like evidence-based medicine, relies cies measured by tests, regardless of what on the best available scientific evidence to those competencies were. They placed more make decisions. At the core of this move- emphasis, for instance, on Extraversion than ment is ‘‘analytics’’ or data-based decision on general mental ability when Extraversion making (e.g., Ayers, 2007). Discussions of was assessed using an unstructured inter- number crunching in the arena of personnel view (and general mental ability was as- selection, however, are almost always lim- sessed using a paper-and-pencil test). The ited to anecdotes from professional sports opposite was found when Extraversion was (e.g., Davenport, 2006). Competing with assessed using a paper-and-pencil test and the analytical point of view are books like general mental ability was assessed using Reliance on intuition and subjectivity 335 an unstructured interview! Clearly, these correct the imperfections (rather than exac- managers believed that good old-fashioned erbate them). The court’s majority opinion ‘‘horse ’’ was needed to accurately size in Gratz suggests that individualized meth- up applicants (see Phelan & Smith, 1958). ods of selection are more fair and reliable The reluctance of employers to use ana- than impersonal ‘‘mechanical’’ ones. Both lytical selection procedures is at least of these examples illustrate two implicit partially a reflection of broader misconcep- beliefs about employee selection: (1) people tions that the general public has about how believe that it is possible to achieve near- to go about assessing and selecting people perfect precision in the prediction of for jobs. Consider two high-profile policy employee success, and (2) people believe opinions on testing and selection in the that there is such a thing as intuitive expertise United States. in the prediction of human behavior. These implicit beliefs exert their influence on pol- d In 1990, the National Commission on icy and practice, even though they may not Testing and Public Policy (1990) issued be immediately accessible (Kahneman, eight recommendations for testing in 2003). I acknowledge that there are a num- schools and the workplace. Among ber of contextual for resistance to those was the statement as follows: selection technologies, including organiza- ‘‘Test scores are imperfect measures tional politics, habit, and culture, along with and should not be used alone to make the existing legal climate (e.g., Johns, 1993; important decisions about individuals’’ Muchinsky, 2004). However, whereas con- (National Commission on Testing and textual issues are often situation specific, Public Policy, 1990, p. 30). The com- these are universal ‘‘truths’’ about people. mission’s chairman, Bernard Gifford of As such, understanding and studying them Apple Computer, commented, ‘‘We provides hope for overcoming user resis- just believe that under no circumstan- tance to selection decision aids. ces should individuals be denied a job or college admission exclusively based Irreducible Unpredictability on test scores’’ (‘‘Panel Criticizes Stan- dard Testing,’’ 1990). I recently came across an article in a popular d In the landmark Supreme Court deci- trade magazine for executives, purportedly sion on affirmative action at the Univer- summarizing the state of the science on sity of Michigan, Rehnquist executive assessment (Sindelar, 2002). I concluded that consideration of race was struck by a statement made by the as a factor in student admission is author: ‘‘For many top-level positions, tech- acceptable—but it must be done at nical competence accounts for only 20 the individual level, with each appli- percent of a successful alignment. Psycho- cant considered holistically. In concur- logical factors account for the rest’’ (pp. rence, Justice O’Connor commented, 13–14).1 Whether intentional or not, the ‘‘But the current [student selection] sys- author was clearly implying what is shown tem, as I understand it, is a nonindivi- on the top of Figure 2—that 80% of the var- dualized, mechanical one. As a result, I iance in executive success can be explained join the Court’s opinion . . . .’’ (Gratz v. by psychological factors (presumably tem- Bollinger, 2003, Concurrence 1). perament or personality). , however, is much more like the chart on the bottom Although these positions sound reasonable of Figure 2—showing that most of the vari- on the surface, they represent fundamen- ance in executive success is simply not tally flawed assumptions. No one disputes that test scores are imperfect measures, but the testing commission implies that 1. The author identified his affiliation as the ‘‘Institute combining them with something else will for Advanced Business Psychology.’’ 336 S. Highhouse

Tech. success. Campbell wrote: ‘‘No external Competence 20% source imposed this [validity ceiling] stan- dard on the discipline or even argued that there should be a standard at all’’ (p. 689). the earlier comment by the national testing commission, cautioning that tests are ‘‘imperfect’’ and must be supplemented with other things. It is remark-

Psych. Factors ably similar to Viteles’ (1925) observation 80% that ‘‘objective scores of vocational tests are at best uncertain diagnostic criteria’’ Tech. Competence (p. 132). This early pioneer of I–O was argu- 20% ing that standardized methods of assessment could only fill the proverbial glass halfway. Intuitive judgment was needed to fill it the Psych. Factors rest of the way. Viteles wrote: ‘‘It is the opin- 10% ion of the writer that in the cause of greater scientific accuracy in vocational selection Unpredictability 70% in industry the statistical point of view must be supplemented by a clinical point of view’’ (p. 134). Countering this position was Freyd Figure 2. Variance in success accounted for (1926), who cautioned against allowing intu- by technical competence and psychological ition to creep into hiring decisions. Freyd, factors. who represented the analytical viewpoint of selection, argued ‘‘allowing selection to predictable prior to employment. The busi- be influenced by personal interpretations ness of assessment and selection involves with their unavoidable prejudices instead of considerable irreducible unpredictability; relying upon objective measures gives even yet, many seem to believe that all failures less consideration to the well-being and in prediction are because of mistakes in the interest of the individual worker’’ (p. 354). assessment process. Put another way, people History proved Freyd prescient. seem to believe that, as long as the applicant Table 1 shows the results of the earliest is the right person for the job and the ap- study investigating the relative effectiveness plicant is accurately assessed, success is of standardized procedures alone versus certain. The ‘‘validity ceiling’’has been a con- supplementing those procedures with intu- tinually vexing problem for I–O psychology itive judgment (Sarbin, 1943). As you can (see Campbell, 1990; Rundquist, 1969). Enor- see, academic achievement was better pre- mous resources and effort are focused on the dicted by the standardized scores alone quixotic quest for new and better predictors than by the scores plus clinical judgment. that will explain more and more variance in The notion that analysis outperforms intui- performance. This represents a refusal, by tion in the prediction of human behavior is knowledgeable people, to recognize that among the most well-established findings many determinants of performance are not in the behavioral sciences (Grove & Meehl, knowable at the time of hire. The notion that 1994; Grove, Zald, Lebow, Snitz, & Nelson, it is still possible to achieve large gains in the 2000).2 Why, therefore, does the intuitive prediction of employee success reflects a fail- ure to accept that there is no such thing as perfect prediction in this domain. Campbell 2. Although few studies in I–O have explicitly made noted that our poor professional self-esteem this comparison, there are a number of examples where tests alone outpredicted tests 1 intuition is based on an unrealistic notion of what can (e.g., Borneman, Cooper, Klieger, & Kuncel, 2007; be achieved in the prediction of employee Huse, 1962; Meyer, 1956). Reliance on intuition and subjectivity 337

Table 1. Sarbin’s (1943) Investigation of Two Methods for Predicting Success of University of Minnesota Undergraduates Admitted in 1939

Predictor composite Correlation with criterion (r) High school rank 1 college aptitude test .45 High school rank 1 college aptitude test 1 .35 intuitive judgment of counselors perspective remain so appealing? Einhorn (1986) noted, however, that one must be (1986) observed that a crucial distinction willing to accept error to make less error. between the intuitive and the analytical approaches to human prediction is the Myth of Expertise of the people making the judg- ments. According to Einhorn, the intuitive I have argued that one of the reasons that approach reflects a deterministic world- people have an inherent resistance to analyt- view, one that rejects the that the future ical approaches to hiring is that they fail to is inherently probabilistic. This is con- view selection in probabilistic terms. A trasted with the analytical worldview, related but different for employer ret- which accepts uncertainty as inevitable. icence to use selection decision aids is that Consider the San Diego Chargers profes- most people believe in the myth of selection sional football team who, despite having expertise. By this I mean the belief that one a regular season record of 14-2 in 2006, can become skilled in making intuitive judg- fired its head coach following a play-off ments about a candidate’s likelihood of suc- loss. The fired coach had a reputation for cess. This is reflected in the survey responses leading teams to successful regular season of the HR professionals who believed in records, only to lose the big games. The ‘‘reading between the lines’’ to size up job Chargers organization evidently failed to candidates. It is also evidenced in the pheno- consider that the contribution of uncer- menal growth of the professional recruiter tainty to a play-off outcome is much greater or ‘‘headhunter’’ profession (Finlay & Cover- than to a 16-game season record. Abelson dill, 1999) and the perseverance of the (1985) found that knowledgeable baseball holistic approach to managerial assessment fans overestimated by a factor of 75 the con- (Highhouse, 2002). tribution of skill (vs. chance) to the likeli- Despite this widespread belief in intuitive hood of a major league baseball player expertise, the data suggest that it is a myth. getting a hit in a given turn at bat. For example, the considerable research on Intuitive approaches to employee selec- predicting human behavior per se shows that tion make the errors in selection ambiguous. experience does not improve Analytical approaches make them part of made by clinicians, social workers, parole the process—hence, visible. Considerable boards, judges, auditors, admission com- research suggests that ambiguity about the mittees, marketers, and business planners likelihood of an outcome (e.g., the operation (Camerer & Johnson, 1991; Dawes, Faust, has an unknown chance of success) encour- & Meehl, 1989; Grove et al., 2000; Sherden, ages more optimism than a low known prob- 1998). Although it is commonly accepted ability (e.g., the operation has a 20% chance that some (employment) interviewers are of success; see Kuhn, 1997). There is little better than others, research on variance in room for optimism when a composite of pre- interviewer validity suggests that differences dictors is known to leave 75% of the variance are due entirely to sampling error (Pulakos, unexplained. This may explain why selection Schmitt, Whitney, & Smith, 1996). Exist- procedures that are difficult to evaluate (e.g., ing evidence suggests that the interrater feelings about ‘‘fit’’) are so attractive. Einhorn reliability of the traditional (unstructured) 338 S. Highhouse interview is so low that, even with a perfectly expert’s ability to interpret configurations reliable and valid criterion, interview-based of traits (Prien, Schippmann, & Prien, 2003). judgments could never account for more The notion behind this argument is that than 10% of the variance in job performance each candidate is unique, and one must con- (Conway, Jako, & Goodman, 1995).3 This sider each piece of about the is troubling for a proce- candidate in light of all the other pieces of dure that is supposed to simultaneously take information. In other words, assessing pat- into account ability, motivation, and person– terns of traits is more accurate than assessing organization fit. Keep in also that these traits individually. For example, Prien et al. findings are based on interviews that had rat- noted that executive assessment requires a ings associated with the interviewers’ judg- ‘‘dynamic interpretation’’ of applicant data, ments. Thus, the unstructured interviews one that takes into account interactions subjected to meta-analyses are almost cer- between test scores and other observations tainly unusual and on the high end of rigor. (p. 125). This view is reinforced by leadership The data do not paint a sanguine picture of theorists who assert that leader characteristics intuitive judgment in the hiring process. exhibit complex configural relations with There are commonly two scholarly rebut- leadership outcomes (e.g., Zaccaro, 2007). tals to the arguments against prediction Even if we do accept that decision makers expertise. I will consider these in turn. One incorporate broken-leg cues and configura- response to the limitations of intuitive tions of traits, existing evidence suggests that approaches to selection is to focus on the these things account for negligible variance ability of experts to spot idiosyncrasies in in the predicted outcome. For example, a candidate’s profile (Jeanneret & Silzer, Dawes (1971) modeled admission decisions 1998). Meehl (1954) noted that one limita- of a four-person graduate admissions com- tion of analytical formulas was their inability mittee using a bootstrapping procedure. This to incorporate ‘‘broken-leg’’ cues. The term is shown in Figure 3. Dawes found that the comes from an anecdotal example in which model (i.e., paramorphic representation) of one is trying to predict whether or not a per- the admission committee’s judgments out- son will go to the movie on a particular day. performed the committee itself. More rele- A mechanical formula might take into vant to this discussion, however, was the fact account things like the nature of the movie that, whereas a linear combination of the (e.g., less likely to go to romantic comedy) or expert cues correlated significantly (r ¼ .25) the weather (e.g., more likely to go on a rainy with the criterion, the residual—which in- day). The mechanical procedure would not cluded configural judgments, broken-leg take into account, however, an event that is cues, and error—was inconsequential (r ¼ extremely rare (e.g., the person has a broken .01). Camerer and Johnson (1991) noted leg), and thus, the mechanical prediction that, despite accounting for a large portion will not be as accurate as a prediction based of the error term, broken-leg cues and con- on a simple intuitive observation. A mechan- figural judgments consistently provide little ical approach to selection would not, the incremental gain in prediction—even for so- logic goes, consider idiosyncratic charac- called experts. The problem with broken-leg teristics of any particular job candidate—a cues is that people rely too much on them seasoned expert would. because they present compelling stories. Another common response to criticisms The tendency to be seduced by detailed of intuitive selection is to focus on the stories causes people to ignore relevant information and to violate simple rules of logic (see Highhouse, 1997, 2001). Also, as 3. Meta-analysis suggests that it accounts for negligible one reviewer noted, broken legs are them- incremental validity over simple paper-and-pencil selves constructs that can and should be tests of cognitive ability and conscientiousness (Cortina, Goldstein, Payne, Davison, & Gilliland, measured reliably. The problem with trait 2000). configurations, on the other hand, is that Reliance on intuition and subjectivity 339

Bootstrapped Models of Experts

r = .19 Predicted Expert Outcome Predictions

r = .25 Model of linear combination of cues r = .01 Expert

configural judgments

Residuals “broken-leg” cues

error

Figure 3. Results from Dawes’ (1971) examination of graduate admissions decisions. they require feats of information integration mentators hold the Bowl Championship that contradict current understanding of Series, which is a mechanical formula that human cognitive limitations (Ruscio, incorporates expert ratings (e.g., coaches 2003). And true real-world examples of pre- poll) and computer rankings (e.g., wins and dictive interactions between job applicant losses of opponents) into an overall ranking characteristics are difficult to find (e.g., of football teams. The nature of the com- Sackett, Gruys, & Ellingson, 1998). Hastie plaints (‘‘unplug the computers’’) suggests and Dawes (2001) distilled from the vast lit- that people do not want mechanical formu- erature on prediction ‘‘experts’’the following las making their expert decisions about who stylized facts: attends bowl games. A University of Oregon coach infamously declared: ‘‘I liken the BCS d They rely on few pieces of information. to a bad disease, like cancer’’ (Vondersmith, d They lack into how they arrive at 2001). Another example of this bias against predictions. decision aids is the considerable patient d They exhibit poor interjudge agree- resistance to diagnostic decision aids (Arkes, ment. Shaffer, & Medow, 2007). Arkes and his col- d They become more confident in their leagues found that physicians who made accuracy when irrelevant information computer-based diagnoses of ankle injuries is presented. were perceived less competent, profes- sional, and thorough than physicians who The obvious remedy to the limitations of made diagnoses without any aids. Indeed, expertise is to structure expert intuition and the idea that (with the appropriate data) mechanically combine it with other decision a physician might not even need to meet or aids, such as paper-and-pencil inventories. interact with a patient to understand his or However, there would likely be consider- her personal health issues would be a hard able resistance to structuring or mechaniz- sell to most people. Physicians, aware of this ing the judgment process (e.g., Lievens et al., lay bias against ‘‘cookbook medicine,’’ 2005; van der Zee, Bakker, & Bakker, 2002). grossly underutilize these valuable technol- Most people believe that aspects of an appli- ogies in practice (Kaplan, 2001).4 Hastie cant’s character are far too complex to be and Dawes (2001) noted that relying on assessed by scores, ratings, and formulas. An example of the irrationality of this bias 4. This underutilization also results from overconfi- against decision aids is the contempt with dence on the part of physicians in their own diagnos- which most college football fans and com- tic expertise. 340 S. Highhouse expertise is more socially acceptable than zation. It was at this point a senior com- relying on test scores or formulas. Research pany official said to me, ‘‘I fail to see the on medical decision making supports this basis for your enthusiasm.’’ (p. 194) contention. It is no wonder, therefore, that HR practitioners would be reluctant to Research on probability neglect (Sunstein, undermine their status by administering 2002) suggests that people make little dis- a paper-and-pencil test, structuring an tinction between probabilities that they employment interview, or plugging ratings consider small. In addition, research on into a mechanical formula. evaluability (Hsee, 1996) has shown that most attributes cannot be evaluated without appropriate context. Perhaps if Muchinsky Concluding Remarks (2004) had compared his .50 to flipping We know quite a bit about applicant reac- a coin (.00) or to an unstructured interview tions to hiring methods (Hausknecht, Day, & (.20), management would have been more Thomas, 2004), but very little has impressed. Perhaps management would been given to user resistance to selection have been more impressed by a common- decision aids. Campbell (1990) noted: ‘‘We language effect size indicator or by an still do not know much about how to best expectancy chart. We simply do not have communicate selection results to people the research to guide these communication outside the [I-O] profession’’(p. 704). Fifteen decisions. years later, Anderson (2005) lamented: ‘‘In The traditional unstructured interview has fact, the whole area of practitioner beliefs remained the most popular and widely used about selection methods and processes is selection procedure for over 100 years a gargantuan one which research has made (Buckley, Norris, & Wiese, 2000). This is little or no inroads into’’ (p. 19). I have despite the fact that, during this same period, inferred from the general psychological lit- there have been significant advancements in erature, and the specific selection literature, the development of selection decision aids. two implicit beliefs that likely inhibit the Guion (1965) argued that the waste of widespread acceptance of selection tech- human resources caused by poor selection nologies. These include the belief that it is procedures should the professional possible to achieve near-perfect precision in conscience of I–O psychologists. It is true predicting performance on the job and the that people are not very predictable, but belief that intuitive prediction can be selection decision aids help. improved by experience. People trust that the complex characteristics of applicants References can be best assessed by a sensitive, equally complex human being. This does not stand Abelson, R. P. (1985). A variance explanation paradox: When a little is a lot. Psychological Bulletin, 97, up to scientific scrutiny, and I–O psycholo- 128–132. gists need to begin their efforts on Anderson, N. (2005). Relationship between practice understanding how to navigate these waters. and research in personnel selection: Does the left hand know what the right is doing? In A. Evers, N. We can begin by drawing from the judgment Anderson, & O. Voskuijl (Eds.), The Blackwell hand- and decision making and human factors lit- book of personnel selection (pp. 1–24). Malden, eratures on how to better communicate MA: Blackwell. Arkes, H., Shaffer, V. A., & Medow, M. A. (2007). uncertainty and error. We also need to learn Patients derogate physicians who use a computer- how to better calibrate user expectations. assisted diagnostic aid. Medical Decision Making, Consider Muchinsky’s (2004) experience in 27, 189–202. Ayers, I. (2007). Super crunchers: Why thinking-by- communicating a .50 validity coefficient for numbers is the new way to be smart. New York: a mechanical comprehension test: Random House. Borneman, M. J., Cooper, S. R., Klieger, D. M., & Kuncel, N. R. (2007, April). The efficacy of the admissions my pleasure regarding the findings was interview: A meta-analysis. In N. R. Kuncel (Chair), highly apparent to the client organi- Alternative predictors of academic performance: