Statistical Problems in ESP Research Author(s): Persi Diaconis Source: Science, New Series, Vol. 201, No. 4351 (Jul. 14, 1978), pp. 131-136 Published by: American Association for the Advancement of Science Stable URL: https://www.jstor.org/stable/1746684 Accessed: 13-12-2019 21:52 UTC

REFERENCES Linked references are available on JSTOR for this article: https://www.jstor.org/stable/1746684?seq=1&cid=pdf-reference#references_tab_contents You may need to log in to JSTOR to access the linked references.

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at https://about.jstor.org/terms

American Association for the Advancement of Science is collaborating with JSTOR to digitize, preserve and extend access to Science

This content downloaded from 132.174.251.2 on Fri, 13 Dec 2019 21:52:55 UTC All use subject to https://about.jstor.org/terms lem can be dealt with by setting up a sec- ond experiment to verify the unantici- pated but interesting outcome of a first experiment. In a much discussed card-guessing ex- Statistical Problems in ESP Research periment reported by Soal and Bateman (2), a receiving subject tried to guess the name of a card that was being thought Persi Diaconis about by a sending subject. When the data were first analyzed, no significant deviations from chance were observed. Several years later, the experimenters Is modern parapsychological research tion with fraud-require of the most noticed that the guessing subject seemed worthy of serious consideration? The sympathetic analyst not only skill in the to name not the card the sender was volume of literature by reputable scien- analysis of nonstandard types of experi- thinking about but rather the card two tists, the persistent interest of students, mental design but appreciation of the dif- cards down in the deck (an example of and the government's funding of ESP ferences between a sympathetic environ- precognition). Once this hypothesis was projects make it difficult to evade this ment with flexible study design and ex- clearly formulated, the data were reana- question. Over the past 10 years, in the perimentation which is simply careless lyzed and new data were collected. The capacity of statistician and professional or so structured as to be impossible to results stood up. The publication of Soal magician, I have had personal contact evaluate. and Bateman's book touched off a series with more than a dozen paranormal ex- In this article I use examples to in- of lively articles (2, 3). The validity of periments. My background encourages a dicate the problems associated with the Soal's experiment is still being debated thorough skepticism, but I also find it generally informal methods of design and [there are claims that the records are useful to recall that skeptics make mis- evaluation of ESP experiments-in par- unreliable (4, 5)], but that he subjected the data to reanalysis after finding an unusual pattern seems acceptable to al- Summary. In search of repeatable ESP experiments, modern investigators most are us-everyone. Whatever the view about ing more complex targets, richer and freer responses, feedback, and more naturalistic reanalysis, the design and evaluation of conditions. This makes tractable statistical models less applicable. Moreover, controlsthe later experiments fall squarely within often are so loose that no valid statistical analysis is possible. Some common the prob- domain of familiar scientific prac- lems are multiple end points, subject cheating, and unconscious sensory cueing. tice. Un- The problems are more acute in the fortunately, such problems are hard to recognize from published records of the nextexperi- example. ments in which they occur; rather, these problems are often uncovered by reports Three of papers in the papers of the Jour- independent skilled observers who were present during the experiment. This nal sug-of (6) describe ex- gests that magicians and psychologists be regularly used as observers. New statisti- periments with a young man called B.D. cal ideas have been developed for some of the new experiments. For example, These many experiments took place at J. B. modern ESP studies provide subjects with feedback-partial information about Rhine's pre- Foundation for Research on the vious guesses-to reward the subjects for correct guesses in hope of inducing Nature ESP of Man in Durham, North Caroli- learning. Some feedback experiments can be analyzed with the use of skill-scoring, na. The aeffects described, if performed statistical procedure that depends on the information available and the way under the controlled conditions, seem like guessing subject uses this information. an exciting scientific breakthrough. In May of 1972, I witnessed a presentation by B.D., arranged by the Psychology takes. For example, the scientific com- ticular, the problems of multiple Departmentend of . I munity did not believe in meteorites be-points and subject cheating. I then re-was asked to observe as a magician, and fore about 1800. Indeed, in 1807 when view a some of the commentaries of out- made careful notes of what went on. Al- meteorite shower fell in Weston, Con- standing statisticians on the problems of though the experiments were not con- necticut, an extended investigation was evaluation. Finally, as an instance of us- trolled, I believe they highlight many made by Professors Silliman and Kings- ing new analytic methods for non- problems inherent in drawing inferences ley of Yale. When Thomas Jefferson- standard experiments, I give examples of from apparently well-controlled experi- then President of the and some new statistical techniques that per- ments. scientist of no small repute-was in- mit appropriate evaluation of studies that Most of the demonstrations I wit- formed of the findings, he reportedly re- allow instant feedback of information nessed B.D. perform involved playing sponded, "Gentlemen, I would rather to the subject after each trial, an cards. In one experiment, two onlookers believe that those two Yankee Profes- entirely legitimate device used to facil- were invited to shuffle two decks of sors would lie than to believe that stones itate whatever learning process may cards,be a red deck and a blue deck. Two fell from heaven" (1). involved. other onlookers were asked to name two Critics of ESP must acknowledge the different cards aloud; they named the ace possibility of missing a real phenomenon of spades and the three of hearts. Both because of the difficulty of designing Informala Design and Evaluation decks were placed face down on a table. suitable experiment. However, the char- We were instructed to turn over the top acteristics which lead many to be A common problem in the evaluation cards of each deck simultaneously and to dubious about claims for ESP-its spo- of ESP experiments is the uncertainty radic appearance, its need for a friendly about what outcomes are to be judged as The author is an assistant professor in the Depart- ment of Statistics, , Stanford, environment, and its common associa- indicative of ESP. Sometimes the prob- California 94305.

SCIENCE, VOL. 201, 14 JULY 1978 0036-8075/78/0714-0131$01.50/0 Copyright ? 1978 AAAS 131

This content downloaded from 132.174.251.2 on Fri, 13 Dec 2019 21:52:55 UTC All use subject to https://about.jstor.org/terms continue turning up pairs in this manner bit of sleight of hand. During several tri- hand knowledge concerns Ted Serios. until we came to either of the named als, I saw him glance at the bottom card Serios claimed that he could create psy- cards. The red-backed three of hearts of ap- the deck he was . He then cut chic photographs on Polaroid film in peared first. At this point, B.D. shouted, the cards, leaving a quarter of an inch cameras he had never seen before. A "Fourteen," and we were instructed to step in the pack. This fixed the location group of scientists in Chicago and Den- count down 14 more cards in the blue of the card he had seen. The cards were ver had become convinced that there pack. We were amazed to find that then the spread out and a card was selected was no trickery involved; indeed, they 14th card was the blue-backed three of by one of the onlookers. When the se- believed that Serios had extraordinary hearts. Many other tests of this kind lected card was replaced in the deck, psychic abilities. I became involved were performed. Sometimes the per- B.D. secretly counted the number of when Eisenbud's book, The World of former guessed correctly, sometimes he cards between the card he had seen and Ted Serios (7), was being considered for did not. the selected card. B.D. named a "ran- review by Scientific American. A team Close observation suggested that B.D. dom" card (presumably the card he of had experienced magicians went to Den- was a skilled opportunist. Consider the glanced at) and asked someone to name ver toa take a close look at Serios' per- effect just described. Suppose that, as small number. He disregarded the firstformance. When we arrived, Serios was the cards were turned face upwards, number named and asked someone else attempting to produce psychic images on both threes of hearts appeared simulta- to name another small number-this time TV film at a Denver TV station. Condi- neously. This would be considered a the difference in location between the tions were chaotic. Several news teams striking coincidence and the experiment card B.D. had seen and the selected were present, each team having brought could have been terminated. The experi- card. One of the observers counted its own Polaroid film. After a short time, ment would also have been judged suc- down in the pack until he came toI managedthe secretly to switch about 20 cessful if the two aces of spades ap- "randomly" named card. Addressing boxes of their film with marked film we peared simultaneously or if the ace of the observer who originally selected had broughta along. We wanted to deter- spades were turned up in one deck at the card, B.D. asked, "What card are mine you whether their film had been pre- same time the three of hearts was turned thinking of?" Sure enough, when viously the exposed. It had not been. The tip in the other. There are other possi- second small number was counted fact,off, however, that it had been so easy bilities: suppose that, after 14 cards had the selected card appeared. When for pre- me to switch the film by sleight of been counted off, the next (15th) card sented in the confusing circumstances hand clearly I indicated that the investiga- had been the matching three of hearts. have described, the trick seemed impos-tors did not have adequate control over Certainly this would have been consid- sible. About ten of the observers were the essential materials. Conditions re- ered quite unusual. Similarly, if the 14th psychology faculty, the remaining five mained like this during our several days' or 15th card had been the ace of spades, were graduate students. When they tried stay, and our observation revealed ir- B.D. would have been thought success- to reconstruct the details of this presen- reparable methodological flaws in all ful. What if the 14th card had been the tation, they could not remember exactly phases of the experiments. Serios openly three of diamonds? B.D. would have who had thought of the number and who used a small paper tube which he placed been "close." In one instance, after he had selected the card. They muddled the on his forehead pointing toward the cam- had been "close," B.D. rubbed his eyes circumstances of this particular test with era "to help focus the thought waves." I and said, "I'm certainly having trouble those of previous tests. I call this blend- observed that he occasionally placed this seeing the suits today." ing of details the "bundle of sticks" phe- tube in front of the camera lens. On one A major key to B.D.'s success was nomenon. It is a familiar element in stan- trial, I thought I saw him secretly load that he did not specify in advance the re- dard tricks: An effect is produced something into the tube. When I asked to sult to be considered surprising. The several times under different circum- examine the tube, pandemonium broke odds against a coincidence of some sort stances with the use of a different tech- loose. Several of the Denver scientists are dramatically less than those against nique each time. When an observer tries present jumped up, shouting things like, any prespecified particular one of them. to reconstruct the modus operandi, the "You can't do that!" Serios hastily put For the experiment just described, in- weak points of one performance are the tube in his pocket. He was not cluding as successful outcomes all possi- ruled out because they were clearly not searched. We were later able to dupli- bilities mentioned, the probability of suc- present during other performances. The cate Serios' pictures in several ways. Af- cess is greater than one chance in eight. bundle of sticks is stronger than any ter our exposd (8) of how we believe This is an example of exploiting multiple single stick. Serios obtained his results, Life maga- end points. To further complicate any B.D.'s performance went on for sever- zine published an article about Serios' analysis, several such ill-defined experi- al hours. Later, some of the observers psychic powers, with no mention of our ments were often conducted simultane- realized that B.D. often took advantage findings. Paranormal claims tend to re- ously, interacting with one another. Theof the inevitable lucky breaks. However, ceive far more media coverage than their young performer electrified his audience. his performance must have made quite exposds. His frequently completely missed guess- an impression on some of the observers There are many other reports of sub- es were generally regarded with sympa- because the 13 July 1973 issue of Science ject cheating in ESP experiments. For thy, rather than doubt; and for most reportedob- that B.D. had been given a example, Gardner (9) figured out how servers they seemed only to confirm grantthe from Harvard "to explore the na- Russian women "saw" with their finger- reality of B.D.'s unusual powers. ture of his own psychic ability." My per- tips and, in a recent paper (10), exposes sonal curiosity about the possibility of Uri Geller's supposedly "foolproof' al- B.D. having powers that upset the teration of the internal memory of sever- Subject Cheating known physical laws is fully satisfied-in al pieces of Nitinol wire. Nitinol is an al- the negative. This position is further dis- loy of nickel and titanium which has a In the experiments at Harvard, B.D. cussed below. memory. Under intense heat, a piece of occasionally helped chance along by a Another expose of which I have first- Nitinol wire can be given a shape. When

132 SCIENCE, VOL. 201

This content downloaded from 132.174.251.2 on Fri, 13 Dec 2019 21:52:55 UTC All use subject to https://about.jstor.org/terms cold, it can easily be reshaped between other potential psychics. However, the of correct guesses was highly significant the fingers. After being heated, it snaps pervasiveness of fraud in so many claims when calculations were based on the as- back to the original shape. One of the for ESP makes it extremely difficult sumptionfor of random guessing on the part most persistently quoted proofs of Gel- the disinterested observer to identify evi-of each listener. It is well known (19) that ler's paranormal powers is Eldon Byrd's dence worthy of credit. Whether Houdi- the distribution of sequences produced claim that "Geller altered the lattice ni was a disinterested witness, as he by human subjects is far from random, structure of a metal alloy in a way thatclaimed, is hard to judge (16). But his and hence the crucial hypothesis of inde- cannot be duplicated." As usual, there tireless is investigation and exposure of pendence fails in this situation. More so- a story of amazing feats performed under spiritualists in England and America (16) phisticated analysis of the Zenith results test conditions (11). Gardner's com- give powerful evidence of the extent of gives no cause for surprise. petent detective work reveals the usual fraud in this domain and of the diffi- In well-run experiments, statistics can tale of chaotic conditions and bad report- culties of detecting it. Randi, also a aidpro- in the design and final analysis. The ing. There is an interesting twist here. fessional magician, has recently under- idea of deliberately introducing external, Supporters of Geller argue that the event taken a detailed expos6 of Uri Geller. well-controlled randomization in investi- is amazing, even in light of chaotic con- Randi repeatedly documents the discrep- gation of paranormal phenomena seems ditions, since Geller could not have had ancy between actual circumstances dueand to Richet (20) and Edgeworth (21). access to a heat source of about 500?C, those reported in newspapers and scien- Later, Wilks (22) wrote a survey article "the only known way to get this result" tific journals (14). on reasonable statistical procedures for (11). Gardner found he could easily alter Even if there had not been subject analyzing paranormal experiments popu- the memory of a piece of Nitinol wire cheating, the experiments described lar at the time. Fisher developed new with a pair of pliers or even by using his above would be useless because they statistical methods that allow credit for teeth. were out of control. The confusing and "close" guesses in card-guessing experi- Unfortunately, a nonmagician's mem- erratic experimental conditions I have ments (23). Good (24) continues to sug- ory of a magic feat is unreliable. For ex- described are typical of every test of gest new experiments and explanations ample, Hyman, a psychologist and magi- paranormal phenomena I have wit- for ESP. The parascience community, cian, has described his visit to the Stan- nessed. Indeed, ESP investigators often well aware of the importance of statisti- ford Research Institute, during which insist on nonnegative observers and sur- cal tools, has solved numerous statistical Geller demonstrated many of his psychic roundings. Because of this, skeptics riddles in its own literature. Any of the feats (12). Hyman reports observing have a difficult time gaining direct access three best known parascience journals is sleight of hand performed under un- to experimental evidence and must rely a source of a number of good surveys controlled conditions, much at variance on published reports. Such reports are and discussions of inferential problems with the published report (13) of the often wholly inadequate. According to (25). SRI scientists involved. Geller probably Davey (17), Hansel (5), and others, it is The actual circumstances of even well- ranks as the most thoroughly exposed not easy to notice crucial details during run ESP tests are sufficiently different psychic of all times (12, 14, 15); yet the ESP experiments. For example, each of from the most familiar types of experi- parascience community continues to de- the studies referred to above describes ment as to lead even able and well-re- fend him as a psychic who is often genu- experimental conditions beyond re- garded analysts into difficulty; and the ine, even though he occasionally cheats. proach. My own observation suggests statistical community has a mixed rec- that the conditions were not in control. ord, with errors in both directions. On Some of these problems can be over- one hand, the celebrated statement by Some Conclusions come by insisting that expert magicians the Institute of Mathematical Statistics and psychologists, skilled at running ex- (26) was widely regarded as an endorse- Rejecting the claims of a psychic who periments with human subjects, be in- ment of ESP analysis methods, a posi- has been caught cheating raises thorny cluded in study protocols. tion that seems hard to justify. As an ex- scientific problems. I am sure that B.D. ample of unjust criticism of ESP, consid- used sleight of hand several times during er Feller's review (27) of the methodolo- the performance I witnessed. Yet, as oneStatisticians and ESP gy of ESP research (28). of the other observers remarked, "The Feller was an outstanding mathemati- people who introduced B.D. never said The only widely respected evidence cian who made major contributions to he didn't do card tricks; they just for paranormal phenomena is statistical. the modern theory of probability. He at- claimed he had extraordinary powers onClassical statistical tests are reported in tacked some of the statistical arguments occasion." During my encounter with each of the published studies described used by J. B. Rhine and his co-workers Serios, a psychologist present put it dif- above. Most often these tests are "high- (see 27). It appears now that several of ferently: "Suppose he was only genuine ly statistically significant." This only im- Feller's criticisms were wrong. To give 10 percent of the time; wouldn't that be plies that the results are improbable un- one instance: a standard ESP deck con- enough for you?" My position is con- der simple chance models. In complex, sists of five symbols repeated five times servative: the similarity of the descrip- badly controlled experiments simple each to make up a 25-card deck. Feller tions of the controlled experiments with chance models cannot be seriously con- found published records of the order of B.D. and Serios to the sessions I wit- sidered as tenable explanations; hence, ESP decks before and after shuffling. He nessed convinces me that all paranormal rejection of such models is not of partic- noticed that one could match up long claims involving these two performers ular interest. For example, the high sig- runs of consecutive symbols in the two should be completely discounted. nificance claimed for the famous Zenith orders and took this as evidence of "un- The fact that a trained observer finds Radio experiment is largely a statistical believably poor results of shuffling" (29). reason to discredit two psychics is not, artifact (18). Listeners were invited to In a follow-up article, Greenwood and of course, sufficient evidence to discredit mail in their guesses on a random se- Stuart (28) pointed out that such runs of the existence of ESP or the integrity quence of of playing cards. The proportion matching symbols did not prove poor

14 JULY 1978 133

This content downloaded from 132.174.251.2 on Fri, 13 Dec 2019 21:52:55 UTC All use subject to https://about.jstor.org/terms mixing. Since each symbol is repeated tennis court, and others) were selected Partial information case. A third situ- five times, long runs of matching sym- from a list of 100 locations chosen to be ation is created by giving only partial in- bols are inevitable. Feller had no re- as distinct as possible. A team of sending formation. The guesser is told only if spect for their remarks: "Both subjectstheir went to each of the nine loca- each guess is correct or not. In this situa- arithmetic and their experiments tionshave in a a random order. A guessing sub- tion, it can be shown that the guesser's distinct twinge of the supernatural," ject triedhe to describe where they were. optimal strategy is to name repeatedly wrote years later (29). After each guess the guessing subject any card-for example, the ace of I believe Feller was confused. As was given feedback by being taken to the spades-until he is told his guess is cor- proof of this, consider one of the true experi- location. This is clearly a complex rect. After he is told that he has guessed ments that Greenwood and Stuart (28) experiment to evaluate, and there are correctly, he then repeatedly calls any carried out to prove their point: they several reasons to discount the findings card known to be in the deck until that simulated two arrangements of ESP presented in (30). I give some of these card is guessed correctly or the run decks from a table of random numbers, reasons at the end of the next section. through the deck is completed. The ex- and they showed that random arrange- I first focus on the analysis of simpler pected number of correct guesses, if this ments exhibited long runs of matching feedback experiments. optimal strategy is used, is symbols. Feller completely misunder- Feedback of some sort is a much-used 1 1 1 stood this experiment; he thought that technique in modern ESP research (31). _ + + +??? 52! 51! +50 Greenwood and Stuart chose a sample of The appropriate analysis of a feedback 25 from a set of five symbols with re- experiment is easy in some simple cases e - 11.72 placement. If the simulation were done but not at all clear in other cases. The where e is the base of the natural loga- by sampling with replacement, only assessment of such experiments requires those outcomes that had exactly five of new methods. Graham and I have ex- rithms. A subject given partial informa- each symbol would be useful. Since plored some of the problems in a situa-tion can minimize the expected number these are rare, the time required to com- tion simple enough to allow mathemati- of correct guesses by naming cards with- out repeating the same card until a cor- plete the simulation reported by Green- cal analysis (32), and the following ex- rect guess is made. The guesser then re- wood and Stuart would have been life- amples are drawn from that research. peatedly calls the card known not to be times long. Thus, Feller found the report Let us consider an experiment that in- of the resulting samples "miraculously volves a sending subject, a receiving in the deck for the remaining calls. The obliging." The comments of Feller that subject, I and a well-shuffled deck ofexpected 52 number of correct guesses in this situation is well approximated by have quoted, suggesting that the investi- cards. The sending subject concentrates gators were at best incompetent, per-on each card in turn, and the receiving 1- - -.632 sisted through three editions of his subjectfa- attempts to guess the suit and e mous text. I have asked students and number of the card correctly. colleagues of Feller about this, and Noall information case. If no additional Similar analysis can be carried out have said that Feller's mistakes were information is available to the receiving with the standard 25-card ESP deck, widely known; he seemed to have de- subject, the chance of a correct guess consisting at of five different symbols re- cided the opposition was wrong and that any point in the experiment is 1 inpeated 52; five times. If no feedback infor- was that. thus, the expected number of correct mation is given to the guessing subject, guesses in a single run through the then, 52- under the hypothesis of chance card deck is 1. If we do not accept guessing, ESP each guess has probability 1/5 Feedback Experiments as possible, it can easily be shown ofthat being correct. In a run through the 25- any system of guessing leads to one card cor- deck, five correct guesses are ex- If ESP phenomena are real, we still do rect guess on the average. However, pected. the In the case of complete feed- not know a reliable method for eliciting distribution of the number of correct back, the best strategy is to guess the them; and any serious exploration of the guesses can vary widely as a function most of probable card at each stage. This subject requires that as much leeway as the guessing strategy: if the same card leads is to 8.65 as the expected number of possible be provided for experimental guessed 52 times in succession, then ex-correct guesses, as shown by Read (34). designs that seem likely to produce an ef- actly one guess will be correct. It has In the case of partial information-telling fect. In their search for replicable experi- been shown that the variance of the num- the guesser only if each guess is right or ments, psychic investigators have modi- ber of correct guesses is largest when wrong-things are more complicated. fied the classical tests of ESP. Important each card is called only once (33). For example, the optimal strategy no changes include the use of targets of in- Complete feedback case. Next, let us longer is to choose the most probable creasing complexity such as drawings or consider an experiment that includes giv- card for each guess. It is easy to give a natural settings and greater use of feed- ing information to the guesser. After simple strategy that gets six cards cor- back, either telling the subject whether each trial he is shown the card he has at- rect on the average: Guess a fixed sym- the guess was right or wrong, or, in a tempted to identify. The most efficient bol until told that five correct guesses card-matching experiment, what the last way the guesser can use this information have been achieved, and then guess a target card actually was. Unfortunately, is always to name a card he knows to be second symbol for the remaining cards. the statistical tools for evaluating the still in the deck. This strategy leads to an There seems to be no simple closed-form outcome of more complex experiments expected number of correct guesses of expression for the optimal strategy; but are not available, and the ad hoc tests the expected number of correct guesses, 1 1 1 created by researchers are often not well if the optimal strategy is used, satisfies a 52 51+ 50 + + 4 understood. An article on remote view- multivariable recurrence that makes dy- ing (30) provides an example. Apparent- in a single run through the deck, namic muchprogramming techniques avail- ly, in a typical phase of the experiment, larger than the one correct we able. expect Gatto at Bell Laboratories suc- nine locations (a local swimming pool, with no information. ceeded in putting this problem on the

134 SCIENCE, VOL. 201

This content downloaded from 132.174.251.2 on Fri, 13 Dec 2019 21:52:55 UTC All use subject to https://about.jstor.org/terms computer and, by solving the recur- Table 1. Card guessing with ten cards and applied par- for reassessing experiments tial feedback. Column 1 is trial number; col- rence, showed that the expected number where subjects were seated within sight umn 2 is subject's guess; column 3 is feedback of correct guesses is 6.63, if the optimal to subject; column 4 is the probability ofor thehearing of one another, and an investi- strategy is used (35). The result took ith guess being correct, given the history gatorup to suspects that unconscious sensory about 15 hours of CPU (central process- time i; for example, subject guessed card cuing 9 on has taken place. To be specific, a ing units) time on a large computer. trial 2 after being told that the guess on trial sender 1 might, by his behavior, uncon- was wrong, penalty = probability (9 on trial 2 These examples show that feedback sciously indicate to the receiver whether given that the guess was wrong on trial 1) = 8/ can drastically change the expected 81; column 5 is the actual card in ith position. his last guess was correct or not. This as- number of correct guesses. S = 3 - 1.0874 = 1.9126 sumes, of course, that right and wrong were the only information cues trans- Trial Guess back TFeed- Penalty Card mitted. If the investigator thinks that the Simple Guessing Experiments with sender cued the guesser with information Feedback: Scoring Rules 1 1 Wrong 0.1000 3 about each card as he looked at it, no sta- 2 9 Wrong 0.0988 4 tistical analysis can salvage the data. 3 6 Wrong 0.0976 8 Available evidence (19) suggests that 4 3 Wrong 0.0965 6 One problem with feedback experi- subjects do not use their best possible 5 2 Right 0.0955 2 ments is that they seem highly sensitive strategies in simple probabilistic experi- 6 1 Wrong 0.1189 10 to clean experimental conditions. If the ments. In more complicated situations- 7 4 Wrong 0.1031 9 conditions break down, it will be hard to for example, if the experimenter uses a 8 7 Right 0.1019 7 9 6 Wrong 0.1282 5 make sense of the data. For example, if a deck of cards with values repeated sev- 10 1 Right 0.1470 1 random number generated in an experi- eral times and gives the subject feedback Total 1.0874 ment with feedback is faulty, it may be as to whether his guess is "close" or that subjects can learn something of the not-the most efficient strategy may be pattern from the feedback (37). In the re- very difficult to compute. Tart (31) gives mote viewing experiment (30) referred to references to the use of scoring rules that Let us consider above, subjects an included example reports of made ex- range from not taking into consideration plicit in Table 1. A deck of ten cards where they had been taken during a the amount of information available to numbered from 1 to 10 was well mixed. "feedback trip" in the description of a including the assumption that the subject A sender looked at the cards in sequence current target. When a judge is given the is using the optimal strategy. Both fromof the top down, and a guesser subjects' nine transcripts, the judge is these approaches seem unnecessarily guessed at each card as the sender told which nine targets were visited but crude. The former might give an untal- looked at it. After each trial the guesser not the order of the visits. Information ented subject a high score, while the lat- was told whether she was correct or not. within a transcript allows a judge to rule ter might penalize a skillful subject who There were three correct guesses. If one out some of the potential targets and ren- does not make efficient use of the infor- ignores the availability of partial infor- ders analysis of the results impossible. mation available to him. mation, one comes to the conclusion that This is only one of many objections to For problems of this type, there exists this response was two more than could the findings in (30). Because of in- a class of scoring rules which depend on be expected by chance. If one assumes adequate specification of crucial details the amount of information available to that the guesser used the optimal strate- (38), I find it impossible to interpret what the subject and on the way the subject gy outlined in the partial information ex- went on during this experiment. uses the information given. The idea is ample to in the previous section, then one subtract at the ith stage the probability would of compare the number of correct the ith guess being correct, given the his- guesses with 1.72, the expected number Conclusions tory up to guess i. For example, if a of correct guesses under the optimal guesser names a card he knows not to be strategy. Thus, one would conclude that To answer the question I started out in the deck, no penalty is subtracted. the score of 3 was 1.28 higher than with, modern parapsychological re- More formally, if Gi is the subject's "chance." The guesses which were ac- search is important. If any of its claims guess on the ith trial and Zi is one or zero tually made are far from the optimal are substantiated, it will radically change as the ith guess is correct or not, then the strategy. For example, on the second tri- the way we look at the world. Even if skill-scoring statistics for n trials is de- al the optimal guess was 1, not 9; on the none of the claims is correct, an under- fined by third trial the optimal guess was 1 or 9, standing of what went wrong provides not 6. In this case, the skill-scoring sta- lessons for less exotic experiments. S= j {,- E(ZiG1, G2, * *., G, tistic scores this experiment as 1.91 high- Poorly designed, badly run, and inap- i=1 er than chance. Skill-scoring statistics propriately analyzed experiments seem Z1, Z2, ' , Zi- )} (1) can be tested by using an appropriate to be an even greater obstacle to prog- normal approximation available via Mar- ress in this field than subject cheating. The conditional expected values tingale that central ap- limit theorems (32). This is not due to a lack of creative in- pear in Eq. 1 can be calculated Skill-scoring for any provides an example of vestigators who work hard but rather to past history with the use of how new mathematical com- statistics can be used the difficulty of finding an appropriate binatorial formulas related to evaluateproblems experiments under non- balance between study designs which of permutations with restricted standard positions conditions. Clearly, experi- both permit analysis and experimental (32). The statistic S is related to the skill- ments designed to include both feedback results. There always seem to be many scoring rules used to evaluate weather and sampling with replacement will be loopholes and loose ends. The same mis- forecasters (36). S has the property that, far easier to evaluate. The problems takes are made again and again. The cri- in the absence of skill (that is, ESP or dealt with above-dependent trials tiques and comments of Davey (17) and talent), the expected score is zero for coupled with feedback-arise in prac- Hall (39) seem as relevant for modern any guessing strategy, optimal or not. tice. For example, the analysis can studiesbe as they did at the turn of the cen-

14 JULY 1978 135

This content downloaded from 132.174.251.2 on Fri, 13 Dec 2019 21:52:55 UTC All use subject to https://about.jstor.org/terms tury. Regrettably,Regrettably, the the problems problems are are hard hard 4. C. Scott andand P.P. Haskel,Haskel, J.J. Soc.Soc. Psych.Psych. Res. Res. 118, 118, 24. I.I. J.J. Good,Parasci.Good,Parasci. Proc. Proc. 1 1(No.2), (No.2), 3 3(1974), (1974), and and 220 (1975). references givengiven therein.therein. to recognizerecognize from from published published records records of of 5. C. E. M.M. Hansel,Hansel, ESPESP aa ScientificScientific EvaluationEvaluation 25. ForFor aa usefuluseful surveysurvey of of this this literature, literature, see see D. D. S. S. the experimentsexperiments in in which which they they occur; occur; (Scribners, NewNew York,York, 1966).1966). Burdick andand E.E. F.F. Kelly,Kelly, "Statistical "Statistical methods methods in in 6. E. F. KellyKelly andand B.B. Kanthanani,J.Kanthanani,J. Parapsychol.Parapsychol. parapsychological research,"research," in in Handbook Handbook of of rather, thesethese problems problems are are often often uncov- uncov- 36, 185 (1972);(1972); B.B. KanthananiKanthanani andand E.E. F.F. Kelly, Kelly, Parapsychology, B. Wolman, Ed. (Van Nos- ibid. 38, 1616 andand 355355 (1974).(1974). trand, , 1977). ered byby reportsreports of of independent independent skilled skilled 7. J. Eisenbud, TheThe WorldWorld ofof TedTed SeriosSerios (Morrow, (Morrow, 26. B. H. Camp, J. Parapsychol. 1, 305 (1937) observers who who were were present present during during the the New York, 1967).1967). (statement in notes section). 8. D. Eisendrath andand C.C. Reynolds,Reynolds, Pop.Pop. Photogr. Photogr. 27. W. Feller, ibid. 4, 271 (1940). experiment. 61 (No. 4), 81 (1967); J. Eisenbud, ibid. 61 28. J. A. Greenwood and C. E. Stuart, ibid., p. 299. There have been many hundreds of se- (No. 5), 31 (1967). 29. W. Feller, An Introduction to Probability Theo- 9. M. Gardner, Science 151, 654 (1966). ry and Its Applications (Wiley, New York, ed. rious studies of ESP, and I have cer- 10. _ , The Humanist 37 (May/June), 25 (1977). 3, 1968), pp. 56 and 407. tainly read and been told about events 11. E. Byrd, in The Geller Papers, C. Panati, Ed. 30. H. E. Puthoff and R. Targ, Proc. IEEE 64, 329 (Houghton Mifflin, Boston, 1976). (1976). that I cannot explain. I have been able to 12. R. Hyman, The Humanist 37 (May/June), 16 31. C. Tart, Learning to Use ESP (Univ. of Chicago (1977). Press, Chicago, 1976), chaps. 1 and 2. have direct experience with more than 13.a H. Puthoffand R. Targ, Mind Reach (Delacorte, 32. P. Diaconis and R. L. Graham, "The analysis of dozen experiments and detailed second- New York, 1977). experiments with feedback to subjects," Ann. 14. J. Randi, The Magic of Uri Geller (Ballantine, Stat., in press. hand knowledge about perhaps 20 more. New York, 1976). 33. J. A. Greenwood, J. Parapsychol. 2, 60 (1938) In every case, the details of what ac- 15. D. Marks and R. Kamman, Zetetic 1 (No. 2), 3 and references therein. (1977). 34. R. C. Read, Am. Math. Mon. 69, 506 (1962). tually transpired prevent the experiment 16. H. Houdini, A Magician Among the Spirits 35. M. A. Gatto, personal communication. from being considered seriously as evi- (Harper, New York, 1924). 36. H. R. Glahn and D. L. Jorgensen, Mon. Weath- 17. S. J. Davey, J. Soc. Psych. Res. 3, 8 (1887). er Rev. 98, 136 (1970). dence for paranormal phenomena. 18. L. D. Goodfellow, J. Exp. Psychol. 23, 601 37. M. Gardner, N.Y. Rev. Books 24 (No. 12), 37 (14 (1938). July 1977). References and Notes 19. P. Slovic, B. Fischoff, S. Lichenstein, Annu. 38. R. Hyman, The Humanist 37 (November/ Rev. Psychol. 28, 1 (1977); A. Tversky and D. December), 47 (1977); D. M. Stokes, J. Am. 1. As quoted in H. H. Nininger, Our Stone-Pelted Kahneman, Science 185, 1124 (1974). Soc. Psych. Res. 71, 437 (1977). Planet (Houghton Mifflin, Boston, 1933). 20. C. Richet, Rev. Philos. 18, 41 (1884). 39. G. S. Hall, Am. J. Psychol. 1, 128 (1887). 2. S. G. Soal and F. Bateman, Modern Experi- 21. F. Y. Edgeworth, Proc. Soc. Psych. Res. 3, 190 40. I thank Tom Cover, , David ments in Telepathy (Yale Univ. Press, New (1885); 4, 189 (1885). For a historical review see Freedman, , Mary Ann Gatto, Haven, Conn., 1954). M. McVaugh and S. H. Mauskopf [Isis 67, 161 Seymour Geisser, Judith Hess, Ray Hyman, 3. G. R. Price, Science 122, 359 (1955); S. G. Soal, (1976)]. William Kruskal, Paul Meier, Lincoln Moses, ibid. 123, 9 (1956); J. B. Rhine, ibid., pp. 11 and 22. S. S. Wilks, N. Y. Statistician 16 (No. 6), (1965); , David Siegmund, Charles 19; P. E. Meehl and M. Scriven, ibid., p. 14; P. 16 (No. 7), (1965). Stein, Stephen Stigler, Charles Tart, and Sandy W. Bridgman, ibid., p. 15; G. R. Price, ibid., p. 23. R. A. Fisher, Proc. Soc. Psych. Res. 34, 181 Zabell for comments on earlier versions. Partial- 17; ibid. 175, 359 (1972). (1924); ibid. 38, 269 (1928); ibid. 39, 189 (1929). ly supported by NSF grant MPS74-21416.

NEWS AND COMMENT ny, and and Britain Britain to cooperateto cooperate in devel- in devel- oping a anew new main main battle battle tank. tank. Tank technology technology is one is oneof the of military the military arts in which the United States does not NATO Builds a Better Battle Tank possess a commanding lead; the Soviet, German, and British traditions of tank design have probably been superior. A But May Still Lose the Battle British designed gun, the 105-mm can- non, is used by the tanks of all three NATO nations, and a revolutionary The battlebattle tank tank is isstill still the the principal principal vanced NATO economies inin tankstanks byby aa method of tank protection, known as weapon of of a amodern modern army. army. Far Farfrom from ratio of 4 to 1. NATO has recentlyrecently cutcut Chobham armor, is also a British inven- driving the the tank tank into into extinction, extinction, tech- tech- the ratio to 2 to 1 yet stillstill hashas onlyonly 70007000 tion. German tanks, with their superior nological developments developments such such as the as theanti- anti- tanks deployed in Europe against the range and accuracy, were generally pre- tank missilemissile have have only only hastened hastened its rateits Warsawrate Pact's 19,000. Nor does the dominant in World War II until out- of evolution.evolution. For For the the past past 15 years15 years the qualitythe of NATO tanks offset the gross numbered. In part because of German United States has stumbled from one fi- deficiency in numbers. Germany's Leop- expertise, Secretary of Defense Robert asco to another in its attempts to design aard 1 and America's M60 are only about McNamara in 1963 initiated a German- new main battle tank, but seems at last to as capable as the Soviet T-72, not by any American project to build a new main have a winner. means its superior. battle tank for the 1970's, the MBT-70. Both the failure and success of the Though everyone agrees on the impor- The designers of the MBT-70 pro- tank development program are integrally tance of NATO standardization, the duced a tank that could squat, so as to related to a central crisis of the NATO commonly proposed remedies often lower its silhouette. They put the driver alliance, the lack of cooperation in de-seem worse than the disease. European in the turret, instead of the hull, and kept signing, developing, and producing countries,new already fretful that they buy him facing forward when the turret weapons. Through failure to standard- $8 of military equipment from the United turned by a counter-rotating cylinder. ize, the NATO allies at present field States31 for every $1 they sell, view calls "It was an all singing, all dancing, thing. different antitank weapons and seven fordif- standardization as another pressure Everybody thought it was absolutely ferent tanks. Such diversity causes a tofor- buy American. To offset its lack of ap- marvelous but far too expensive and far midable logistics problem. It is the prod- petite for European weapons, the United too complicated for any crew to uct of duplicative national research pro-States has tried to develop weapons handle," says one NATO observer. As grams which waste about a third of jointly the with its allies, but with notable the cost approached $1 million a tank, alliance's general purpose R & D bud- lack of success. Nowhere have the inher- Congress killed the MBT-70 in 1969. get. It is a principal factor in the alarming ent problems of standardization been Both sponsoring countries went their paradox that the backward economies of more vividly brought to light than in the separate ways, the Germans starting the Warsaw Pact can outproduce ad- Sisyphean attempts by America, Germa- work on the Leopard 2 and the American 136 0036-8075/78/0714-0136$01.00/0 Copyright ? 1978 AAAS SCIENCE, VOL. 201, 14 JULY 1978

This content downloaded from 132.174.251.2 on Fri, 13 Dec 2019 21:52:55 UTC All use subject to https://about.jstor.org/terms