The Robust Beauty of Improper Linear Models in Decision Making
Total Page:16
File Type:pdf, Size:1020Kb
The Robust Beauty of Improper Linear Models in Decision Making ROBYN M. DAWES University of Oregon ABSTRACT: Proper linear models are those in which A proper linear model is one in which the predictor variables are given weights in such a way weights given to the predictor variables are chosen that the resulting linear composite optimally predicts in such a way as to optimize the relationship be- some criterion of interest; examples of proper linear tween the prediction and the criterion. Simple models are standard regression analysis, discriminant regression analysis is the most common example function analysis, and ridge regression analysis. Re- of a proper linear model; the predictor variables search summarized in Paul Meehl's book on clinical are weighted in such a way as to maximize the versus statistical prediction—and a plethora of re- search stimulated in part by that book—all indicates correlation between the subsequent weighted com- that when a numerical criterion variable (e.g., graduate posite and the actual criterion. Discriminant grade point average) is to be predicted from numerical function analysis is another example of a proper predictor variables, proper linear models outperform linear model; weights are given to the predictor clinical intuition. Improper linear models are those in variables in such a way that the resulting linear which the weights of the predictor variables are ob- composites maximize the discrepancy between two tained by some nonoptimal method; for example, they or more groups. Ridge regression analysis, an- may be obtained on the basis of intuition, derived other example (Darlington, 1978; Marquardt & from simulating a clinical judge's predictions, or set to Snee, 1975), attempts to assign weights in such be equal. This article presents evidence that even a way that the linear composites correlate maxi- such improper linear models are superior to clinical in- mally with the criterion of interest in a new set tuition when predicting a numerical criterion from numerical predictors. In fact, unit (i.e., equal) weight- of data. ing is quite robust for making such predictions. The Thus, there are many types of proper linear article discusses, in some detail, the application of unit models and they have been used in a variety of weights to decide what bullet the Denver Police De- contexts. One example (Dawes, 1971) was pre- partment should use. Finally, the article considers sented in this Journal; it involved the prediction commonly raised technical, psychological, and ethical of faculty ratings of graduate students. All gradu- resistances to using linear models to make important social decisions and presents arguments that could weaken these resistances. Work on this article was started at the University of Paul MeehPs (1954) book Clinical Versus Statis- Oregon and Decision Research, Inc., Eugene, Oregon; it was completed while I was a James McKeen Cattell Sab- tical Prediction: A Theoretical Analysis and a batical Fellow at the Psychology Department at the Uni- Review of the Evidence appeared 25 years ago. versity of Michigan and at the Research Center for It reviewed studies indicating that the prediction Group Dynamics at the Institute for Social Research there, I thank all these institutions for their assistance, and I of numerical criterion variables of psychological especially thank my friends at them who helped. interest (e.g., faculty ratings of graduate students This article is based in part on invited talks given at who had just obtained a PhD) from numerical the American Psychological Association (August 1977), the University of Washington (February 1978), the predictor variables (e.g., scores on the Graduate Aachen Technological Institute (June 1978), the Univer- Record Examination, grade point averages, ratings sity of Groeningen (June 1978), the University of Am- of letters of recommendation) is better done by a sterdam (June 1978), the Institute for Social Research at the University of Michigan (September 1978), Miami Uni- proper linear model than by the clinical intuition versity, Oxford, Ohio (November 1978), and the Uni- of people presumably skilled in such prediction. versity of Chicago School of Business (January 1979). The point of this article is to review evidence that I received valuable feedback from most of the audiences. Requests for reprints should be sent to Robyn M. even improper linear models may be superior to Dawes, Department of Psychology, University of Oregon, clinical predictions. Eugene, Oregon 97403. Vol. 34, No. 7,571-582 AMERICAN PSYCHOLOGIST • JULY 1979 • 571 Copyright 1979 by the American Psychological Association, Inc. 0003-066X/79/3407-0571$00.75 ate students at the University of Oregon's Psy- random. Nevertheless, improper models may have chology Department who had been admitted be- great utility. When, for example, the standard- tween the fall of 1964 and the fall of 1967—and ized GREs, GPAs, and selectivity indices in the who had not dropped out of the program for non- previous example were weighted equally, the re- academic reasons (e.g., psychosis or marriage)— sulting linear composite correlated .48 with later were rated by the faculty in the spring of 1969; faculty rating. Not only is the correlation of this faculty members rated only students whom they linear composite higher than that with the clinical felt comfortable rating. The following rating judgment of the admissions committee (.19), it is scale was used: S, outstanding; 4, above average; also higher than that obtained upon cross-validat- 3, average; 2, below average; 1, dropped out of ing the weights obtained from half the sample. the program in academic difficulty. Such overall An example of an improper model that might be ratings constitute a psychologically interesting cri- of somewhat more interest—at least to the general terion because the subjective impressions of fac- public—was motivated by a physician who was on ulty members are the main determinants of the a panel with me concerning predictive systems. job (if any) a student obtains after leaving gradu- Afterward, at the bar with his' wife and me, he ate school. A total of 111 students were in the said that my paper might be of some interest to sample; the number of faculty members rating my colleagues, but success in graduate school in each of these students ranged from 1 to 20, with psychology was not of much general interest: the mean number being 5.67 and the median be- "Could you, for example, use one of your improper ing 5. The ratings were reliable. (To determine linear models to predict how well my wife and I the reliability, the ratings were subjected to a one- get along together?" he asked. I realized that I way analysis of variance in which each student be- could—or might. At that time, the Psychology ing rated was regarded as a treatment. The Department at the University of Oregon was en- resulting between-treatments variance ratio (»j2) gaged in sex research, most of which was be- was .67, and it was significant beyond the .001 havioristically oriented. So the subjects of this level.) These faculty ratings were predicted from research monitored when they made love, when a proper linear model based on the student's Grad- they had fights, when they had social engagements uate Record Examination (GRE) score, the stu- (e.g., with in-laws), and so on. These subjects dent's undergraduate grade point average (GPA), also made subjective ratings about how happy they and a measure of the selectivity of the student's were in their marital or coupled situation. I undergraduate institution.1 The cross-validated immediately thought of an improper linear model multiple correlation between the faculty ratings to predict self-ratings of marital happiness: rate and predictor variables was .38. Congruent with of lovemaking minus rate of fighting. My col- Meehl's results, the correlation of these latter fac- league John Howard had collected just such data ulty ratings with the average rating of the people on couples when he was an undergraduate at the on the admissions committee who selected the University of Missouri—Kansas City, where he 2 students was .19; that is, it accounted for one worked with Alexander (1971). After establish- fourth as much variance. This example is typical ing the intercouple reliability of judgments of of those found in psychological research in this lovemaking and fighting, Alexander had one part- area in that (a) the correlation with the model's ner from each of 42 couples monitor these events. predictions is higher than the correlation with clin- ical prediction, but (b) both correlations are low. She allowed us to analyze her data, with the fol- These characteristics often lead psychologists to lowing results: "In the thirty happily married interpret the findings as meaning that while the low correlation of the model indicates that linear modeling is deficient as a method, the even lower ^This index was based on Cass and Birnbaum's (1968) rating of selectivity given at the end of their book Com- correlation of the judges indicates only that the parative Guide to American Colleges. The verbal cate- wrong judges were used. gories of selectivity were given numerical values according An improper linear model is one in which the to the following rale: most selective, 6; highly selective, 5; very selective (+), 4; very selective, 3; selective, 2; weights are chosen by some nonoptimal method. not mentioned, 1. They may be chosen to be equal, they may be 2Unfortunately, only 23 of the 111 students could be used in this comparison because the rating scale the chosen on the basis of the intuition of the person admissions committee used changed slightly from year making the prediction, or they may be chosen at to year. 572 • JULY 1979 • AMERICAN PSYCHOLOGIST couples (as reported by the monitoring partner) few would agree that it doesn't entail some ability only two argued more often than they had inter- to predict.