<<

Reconsidering the Fundamental Contributions of Fisher and Neyman on Experimentation and Author(s): Stephen E. Fienberg and Judith M. Tanur Source: International Statistical Review / Revue Internationale de Statistique, Vol. 64, No. 3 (Dec., 1996), pp. 237-253 Published by: International Statistical Institute (ISI) Stable URL: http://www.jstor.org/stable/1403784 Accessed: 19/10/2008 12:47

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=isi.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We work with the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].

International Statistical Institute (ISI) is collaborating with JSTOR to digitize, preserve and extend access to International Statistical Review / Revue Internationale de Statistique.

http://www.jstor.org InternationalStatistical Review (3), 1966, 64, 237-253, Printedin Mexico ? InternationalStatistical Institute

Reconsideringthe Fundamental Contributions of Fisher and Neyman on Experimentation and Sampling

Stephen E. Fienberg1 and Judith M. Tanur2 1Departmentof , CarnegieMellon University,Pittsburgh, PA 15213-3890, USA 2Departmentof Sociology, State University of New Yorkat Stony Brook,Stony Brook,NY 11794-4356, USA

Summary .A. Fisher and are commonly acknowledged as the statisticians who provided the basic ideas that underpin the and the design of sample surveys, respectively. In this paper, we reconsider the key contributions of these great men to the two areas of research. We also explain how the controversy surrounding Neyman's 1935 paper on agricultural experimentation in effect led to a split in research on experiments and on sample surveys.

Key words:Clustering; Design of experiments;Design of samplesurveys; Randomization; Stratification.

1 Introduction

The 1920s and 1930s marka critical period in the developmentof statistics. Much of the modern theory of estimationand statisticaltesting emergedfrom paperspublished during this period by two of the most importantstatisticians of the century,Ronald Aylmer Fisher and Jerzy Neyman. In many ways, the theoreticalwork of these two greatmen was stimulatedby practicalproblems they encoun- tered in their day-to-day statisticalwork of agriculturalexperimentation and, later for Neyman, the sampling of human populations.Moreover many believe that their greatest accomplishmentswere not in the realm of theoriesof inferencebut rather in theirarticulation of theoryand principlesunder- pinning the two key modernmethods for the scientific collection of data-randomized experiments and randomsampling. In connection with a largerproject on experimentsand sample surveys, and in part stimulatedby the recent occasion of the 100th anniversaryof Neyman's birth, we have rereadmuch of the work of both Fisher and Neyman in the areas of sampling and experimentation.Our story for both Fisher and Neyman begins with the work that led to their 1923 paperson agriculturalexperimentation. For Neyman, this work continuedlargely in the directionof samplingand culminatedin his now famous 1934 paper on the topic. Despite the many contributionsof others to the technical development of sampling in the period from 1900 to 1925, Neyman's 1934 paper played a pivotal role in turning the dry of expectations into real sampling plans for actual randomly selected large scale surveys. For Fisher, his 1923 paperled to furtherpioneering work in experimentaldesign that culminatedin the publicationof his 1935 book, The Design of Experiments.Fisher's book played a catalytic role in the actual use of randomizationin controlled experimentsthat is similar to the role Neyman's 1934 paperplayed in the use of methods for randomsampling. The year 1935 was a 238 S.E. FIENBERGand J.M. TANUR critical one in the developmentsin the fields of experimentationand sampling,not simply because of the publicationof Fisher'sbook but also because of a majorcontroversy between Fisher and Neyman engendered by Neyman's (1935) paper on agriculturalexperimentation. Because of the bitterness that grew out of this dispute (and a related one between Fisher on the one hand and Neyman and Pearson on the other, over tests of hypotheses and then later over confidence intervals),Fisher and Neyman were never able to bring their ideas together and benefit from the fruitful interactionthat would likely have occurredhad they done so. And in the aftermath,Neyman staked out intellectual responsibilityfor sampling while Fisher did the same for experimentation.It was in partbecause of this rift between Fisher and Neyman that the fields of sample surveys and experimentationdrifted apart. In the next section, we begin with Neyman'simportant 1923 paperon agriculturalexperimentation, and we remind the reader of the interrelatednature of fundamentalideas on experiments and surveys and the pivotal role that randomizationplayed in both areas. Then our narrativeproceeds by interweaving accounts of the work by Fisher and Neyman on experimentationand sampling, and controversiesthat surroundedthem. Our account culminateswith the clash between Fisher and Neyman over ideas in Neyman's 1935 paper.Throughout we include relevantbiographical material assembled from a varietyof sources includingBox (1978, 1980), Lehmann& Reid (1982) and Reid (1982).

2 Parallels between Surveys and Experiments: Innovations in Neyman's 1923 Work on Agri- cultural Experimentation

JerzyNeyman was bornof Polish parentsin 1894, in Bendery,which has been variouslylabeled as Rumania,Ukraine, and Moldavia because of the vicissitudes of border-drawingin EasternEurope. In 1912 he enteredthe Universityof Kharkov(which laterbecame Maxim GorkiUniversity) to study mathematics.Finishing his undergraduatestudies in 1917, Neyman remained at the University of Kharkovto begin to preparefor an academic career;he also received an appointmentas a lecturer at the KharkovInstitute of Technology.In the fall of 1920 he passed the examinationfor a Masters degree and became a lecturerat the University.Thus until afterhis 27th birthdayNeyman was doubly isolated-both by living in a provincial city and by being part of an ethnic minority in that city. In the spring of 1921, learning that he was to be arrested,Neyman fled to the country home of a relative,where he supportedhimself by teachingthe childrenof peasants.In the summerhe returned to Kharkovand thence to Bydgoszcz in northernPoland, as partof an exchangeof nationalsbetween Russia and Poland agreed to at the end of the Russian-Polish War.In Bydgoszcz, Neyman went to work as "senior statistical assistant"at the National AgriculturalInstitute and it was there that he wrote two long papers on agriculturalexperimentation that were published in 1923 in Polish (Splawa-Neyman, 1990 [1923a], 1925 [1923b]). In this section we focus primarilyon the first of these papers, and we returnto the second in Section 4, discussing its 1925 republicationin English. An excerpt of the 1923 paper(Splawa-Neyman, 1990 [1923a]) on experimentationwas recently translatedfrom the Polish originaland publishedin StatisticalScience, and in it we were especially struckby the importancethat repeatedrandom sampling played in Neyman's thinking.This seemed to us to foreshadow,at the very least, the use of randomizationin experimentation.Reid (1982, p. 44) quotes Neyman considerablylater as denying his priorityhere:

". .. I treatedtheoretically an unrestrictedlyrandomized agricultural experiment and the randomizationwas consideredas a prerequisiteto probabilistictreatment of the results. This is not the same as the recognition that without randomizationan experimenthas little value irrespectiveof the subsequenttreatment. The latterpoint is due to Fisher and I consider it as one of the most valuableof Fisher'sachievements." The FundamentalContributions of Fisher and Neymanon Experimentationand Sampling 239

Since we see one of the major purposes of experimentalrandomization as the necessary precon- dition for probabilisticinference from the results, we wouldjoin Rubin (1990, p. 477) in saying that had Neyman later claimed priorityrather than denying it, we would have had no reason to quarrel with that claim. Rubin (1990) also reminds us that the use of randomizationwas "in the air" in the early 1920s, citing Student(1923, pp. 281-282) and Fisher & MacKenzie (1923, p. 473). There is, however, anotherimportant feature of Neyman's first paper on the topic of agricultural experimentationwe think especially worthy of note. For a numberof years, we have pursued the parallels and linkages between surveys and experiments(e.g., see Fienberg & Tanur,1987, 1988, 1989) and, in particular,Fisher's and Neyman's roles as progenitorsof ideas in both areas. Although in fact surveys and experimentshad developed very long and independenttraditions by the start of the 20th century, it was only with the rise of ideas associated with mathematicalstatistics in the 1920s that the tools for majorprogress in these areasbecame available.The key intellectualidea was the role of randomizationor random selection, both in experimentationand in sampling, and both Neyman and Fisher utilized this idea, althoughin differentways. But then, as now, it is clear that one can adaptthe ideas from one field to the other and that one can profitablylink them, as in sampling embedded within experimentsor experimentsembedded within surveys. Neyman's first 1923 paperon experimentaldesign embodies this sort of adaptation.He conceptu- alizes the assignment of treatmentsto units in an experimentas the drawingwithout replacement of balls from urns, one urn for each treatment.These urnshave the special propertythat the removal of a ball (representingthe outcome of an experimentalunit) from one urn causes it to disappearfrom the other urns as well. Thus Neyman shows that when there is a finite pool of experimentalunits that need to be assigned to treatments,the randomassignment of units to treatmentsis exactly parallel to the randomselection of a sample from a finite population.Hence, when the numberof units used in an experimentis a large fractionof the units in the population,a finite populationcorrection must be used in an experiment,just as it is in a sample survey. The parallel between experiments and sampling is particularlyclose in this case. Nevertheless,all but a few moderninvestigators have lost sight of the parallel and fail to take advantageof insights offered in the parallelliterature.

3 Fisher's Initial Contributions to the Design of Experiments in the 1920s While it is only recently that statisticiansand others have rediscoveredNeyman's early contribu- tions to the design of experiments,Fisher has long been recognized for the innovationhe broughtto the field. Fisher was bornin 1890 in London and it was duringhis undergraduateyears at Cambridgethat his interestsin eugenics, genetics, and biometrystimulated his interestin probabilityand statistics.After graduationin 1913, he taughtmathematics and physics at a series of schools but pursuedresearch in both genetics and statistics and published his first majorpapers on these topics. This work led, six years later, to a temporarystatistical position in 1919 at RothamstedExperiment Station, where the director,Sir John Russell, wantedsomeone who would be preparedto examine the accumulateddata on wheat yields from BroadbalkFields in order to elicit furtherinformation that might have been missed. Russell quickly recognized Fisher's genius and set about convertingthe temporaryposition to a more permanentone (see Box, 1978, pp. 96-97). In 1921, Fisherpublished the firstof a series of papersentitled "Studiesin crop variation"intended for the JournalofAgricultural Science', in which he presentedsome of his reanalysesof the Broadbalkdata. But it was in the second in this series of papers,written with W.A. Mackenzie (1923), thatFisher's new ideas on agriculturalexperimentation emerged. Fisher & Mackenzie (1923) contains two key innovations for the design of experiments: the 'Actually the third paper in the series appearednot in the Journal of Agricultural Science but in the Philosophical Transactionsof the Royal Society, and without the title, "Studiesin crop variation".Two later papers with Thomas Eden in 1927 and 1929 are labelled as V and VI. 240 S.E. FIENBERGand J.M. TANUR introductionof the analysis of ,adapted from Fisher's 1918 paperon Mendelianinheritance, and randomization.In fact, in introducingthe , the authors argue that it is conditionalon randomization:

"Further,if all the plots are undifferentiated,as if the numbershad been mixed up and writtendown in randomorder, the averagevalue of each of the two partsis proportional to the numberof degrees of freedomin the variationof which it is compared."(p. 315) The spirit of this statementis not unlike that in Splawa-Neyman (1990 [1923a]): "The goal of a field experimentwhich consists of the comparisonof n varietieswill be regardedas equivalentto the problemof comparingthe numbersal, a2, . . ., an or theirestimates by way of drawingseveral balls from an urn."(p. 467) Both speak of randomization,but in hypotheticalterms. There are dramatic differences between Fisher's and Neyman's treatmentsas well, with Fisher focusing on the actual dataanalysis and Neymanputting much emphasis on the mathematicalformulation of finitesampling and expectations. Neither Fisher nor Neyman actually wrote about the implementationof randomizationin 1923, however.In fact, as Cochran(1980, p. 18) notes in connectionwith the 2 x 12 x 3 factorialexperiment describedby Fisher & Mackenzie (1923): "No randomizationwas used in this layout. Following the procedurerecommended at thattime, the layout apparentlyattempts to minimizethe errorsof the differencesbetween treatmentmeans by using a chessboard arrangementthat places different treatments near one another as far as is feasible. This arrangementutilizes the discovery from uniformitytrials that plots nearone anotherin a field tend to give closely similaryields. A consequence is, of course, that the analysis of varianceestimate of the errorvariance perplot, which is derivedfrom differences in yield perplots receivingthe same treatment, will tend to overestimate,since plots treatedalike are fartherapart than plots receiving differenttreatments. Fisher does not commenton the absenceof randomizationor on the chessboarddesign. Apparentlyin 1923 he had not begun to think about the conditions necessary for an experimentto supply an unbiasedestimate of error." A furtherproblem with Fisher's 1923 analysis was his failure to recognize the split-plot structure of the actual experimentwith respect to a third factor, potassium.This he correctedin Statistical Methodsfor Research Workers(1925). Thus, in 1923, we see both Fisher and Neyman as having arrivedat similarpositions with respect to randomization,although Fisher alreadyhad moved ahead in his thinkingabout other features of design. Box (1980) argues, that the statementin Fisher & Mackenzie on the analysis of variance rested heavily on Fisher's appreciationof the requirementsof underlying theory for the normal distributionupon which hereans, reld in his analysis, as well as upon Fisher's "geometricrepresentation that by then was second natureto him. He could picture the distributionof n results as a pattern in n-dimensional space, and he could see that randomizationwould produce a symmetry in that patternrather like that producedby a kaleidoscope,and which approximatedthe requiredspherical symmetryavailable, in particular,from standardnormal theory assumptions"(p. 2). While we agree thatFisher's geometricalintuition was highly refined,we see little supportin the 1923 paperfor this conclusion. But by the publicationof StatisticalMethods in 1925, Fisherclearly had a deeper appreciationof the function of randomizationin experimentation: "Thefirst requirement which governsall well-plannedexperiments is thatthe experiment should yield not only a comparisonof differentmanures, treatments, varieties, etc., but also a means of testing the significance of such differences as are observed .... For our test of significance to be valid the differences in fertility between plots chosen as The FundamentalContributions of Fisher and Neymanon Experimentationand Sampling 241

parallels must be truly representativeof the differences between plots with different treatment[s];and we cannot assume that this is the case if our plots have been chosen in any way accordingto a prearrangedsystem." (p. 248) According to Fisher, valid tests result if treatmentsare assigned to plots wholly at random,and here we have a clear departurefrom the 1923 papers. The treatmentof experimentationin StatisticalMethods, which follows this discussion of random- ization, consists of only a few pages. Fisher introducedblocking as a device to increase the accuracy of treatmentcomparisons and he illustratedits use in both randomizedblock experiments and in Latin squares. He described the split-plot structureof the experimentanalyzed in the 1923 paper, which should have requiredtwo levels of randomizationfor its validity, but Fisher did not discuss this point in the original or subsequenteditions of the book. Later embellishments to the theory of experimentaldesign came in Fisher (1926) where he laid out in a more systematic fashion the principlesunderlying field experimentation,including the need for replication, and in which he expoundedupon the notion of factorialdesigns with new insights, including the role of confounding.The example used in this paperwas from a real experimentbeing run by a colleague, ThomasEden, and Eden & Fisher (1927), presentsa more detaileddescription of the experimentand an analysis of its results. Factorialstructures, Fisher (1952) would later observe, had their roots in the "simultaneousinheritance of Mendelianfactors". Throughoutthis period, Fisher continued to work with colleagues at Rothamstedand elsewhere in the design and analysis of experimentsthat implementedhis ideas on randomizationand factorial structures.But not all of his Rothamstedcolleagues had fully accepted Fisher's insistence on ran- domization as the foundationof experimentatiion(Box, 1980). In fact, Fisher's 1926 article was at least in part a response to an article by Sir John Russell, directorof Rothamsted,on the state of the art in experimentaldesign. In his paper,Russell had noted Fisher's argumentfor randomizationbut said that it was impossible to use in practice!Despite this disagreement,Russell allowed Fisher and Eden to run theircomplex randomizeddesigns on fields at Rothamstedand this ultimatelyconvinced many others of the value of Fisher's innovativeideas (Box, 1980).

4 Neyman's Initial Contributions to the Theory of Sampling in the 1920s In 1924, Neyman obtainedhis doctors degree from the Universityof Warsaw,using as a thesis the work done in Bydgoszcz, and in 1925, he obtaineda governmentgrant to study for a year in London with .Before his arrivalin London, Neyman had shippedto KarlPearson several of his statisticalpublications, including the two 1923 paperson agriculturalexperimentation, and Pearson suggested thatNeyman republish part of the second paperin Englishin (Splawa-Neyman, 1925 [1923b]). But Pearsonbelieved thatNeyman's statementat the end of the paper,that it is only in samplingfrom a normalpopulation that the sample mean and sample varianceare independent,to be mistaken. When Neyman tried to explain to Pearsonhis confusion between independenceand lack of correlation(in haltingEnglish and in frontof severalother Pearson students), Pearson interrupted: "Thatmay be true in Poland, Mr. Neyman, but it is not true here."(Reid, 1982, p. 57) Dismayed at having offended Pearson and at perhapshaving lost a chance to publish in English-his Polish mentors had sent Neyman to as a kind of test to see if his ideas were worth anything, and a publicationin English would go a long way towardssettling that matter-Neyman searchedfor a way to communicatehis explanationto Pearson.He finally offeredhis explanationto J.O. Irwin,who communicatedit to ,who finally convinced his fatherthat Neyman was not mistaken. Thus the Biometrikaversion does contain this observationfrom the Polish version. In the resulting 1925 Biometrikapaper, Neyman gave the higher moments of the means and of samples from finite populations.The paper is succinct and to the point. He begins by deriving formulas for moments of the sampling distributionof the mean of samples of size n from 242 S.E. FIENBERGand J.M. TANUR a finite population of size m, where the sample members are drawn without replacement,and he relates these to the moments of the underlyingfinite population.Then he focuses on the variance, skewness, and kurtosis, of the sampling distribution,i.e., M2, B1 = M/M23, and B2 = M 2/M2, where M2, M3, and M4 are the centralmoments of the samplingdistribution, and he expresses these in terms of /,2, P1 = ,LlJ/,t3, and P2 = 4/ 1A2 where u2>,/3, and 14 are the centralmoments of the finite populationof m elements. He explains how the moments of the sampling distributiontend to the usual ones for sampling with replacementwhen we let m - 0oo.He then turnsto the first and second moments of the sample variance,expressing them in terms kL2 and B2, and to the correlation between the squareof the sample mean and the sample varianceas well as to the correlationbetween the sample mean and the sample variance,expressing the firstin termsof P2 and the second in terms of P31and B2. In the final paragraphof the paper, by letting the finite population size m -+ 0o, Neyman uses the correlationformulas to argue that the independenceof the sample mean and the sample varianceholds only for normaldistributions, the point originallyin dispute with Pearson. The 1925 papermight have passed unnoticed,but for the fact that,in 1927, MajorGreenwood and published a complaintabout it in the Journalof the Royal Statistical Society, pointing out that Neyman had failed to acknowledgethe publishedpapers of the recently deceased Russian statistician,Alexander Alexandrovitch Tchouproff2. Greenwood & Isserlis (1927, p. 348) quote Neyman as writing of the formula for the second moment of the variance"[t]his result, being a generalizationof formulaegiven by other authors,is, I believe, novel and of considerable importance".Indeed, after that statementon page 477 of his 1925 paper,Neyman did actuallycite one of Tchouproff's1918 Biometrikapapers as well as a 1925 paper by Church.Greenwood and Isserlis neverthelesstake him to task for failing to cite works by Karl Pearson, Isserlis, and Edgeworthbesides those of Tchouproff.In their view, Neyman's felony was compoundedby the work of A.E.R. Church,especially a 1926 paperpublished in Biometrika, which cited Neyman for some of the formulasfor moments ratherthan citing Tchouproff.There is little question but that Neyman's results can be found at least in some form in the sea of formulas in Tchouproff's 1923 papers. What appearsto have been happeningin these and other papers by Tchouproffis the alternativealgebraic derivation and expressionof a varietyof momentexpressions, most of which were quite complex. One of the nice featuresof Neyman (1925) is the relativelyclean derivationsand succinct formulas. By the time the Greenwood-Isserliscritique was published,Neyman had returnedto Poland. In obvious distress at the accusation, he wrote to Egon Pearsonsoliciting advice. The reply, it turned out, was handledby Karl Pearson(1927), who editorializedin Biometrikain Neyman's defense (as well as Church's). He argued that Neyman's original publicationin Polish in 1923 was certainly contemporaneouswith Tchouproff'spair of 1923 articles in Metron,and perhapsactually predated those Tchouproffpublications because delays in the publicationof Metroncaused it to appearlater than its cover date. But it seems to us that this argumentfrom Pearson,while it may well be true, is irrelevantto the controversy-Greenwood and Isserlis were accusing Neyman of ignoring not these specific papers of Tchouproffbut a whole body of literaturethat originatedsome 20 years earlier with work by Pearson himself. PerhapsPearson was in some sense defending himself as editor of Biometrikafor his laxness in failing to urge Neyman, the young foreigner,to updatehis work for an English-readingaudience with citationsto the literaturein English. ThatNeyman could have profited by such urging seems clear from anotherfacet of Pearson'seditorial defense-he cites a letter from Dr. K. Bessalik, Professor of the University of Warsaw,who in 1922 was Director of the State Instituteof AgriculturalResearch in Bydgozcz. In that letter, Bessalik certifies that Neyman wrote his paper in Bydgozcz in 1922 and that "no English Journals"were accessible to him. Had Pearson urged Neyman to place the republicationof his results in the historicalcontext about which he had 2This was the transliterationof Tchouproff's name, as it appearedin the 1918 Biometrikaarticles. We use the spelling Tchouproffthroughout, even though when he publishedin Metron,the editor used Tschuprow,and elsewhere he is referred to as Chuprovand Chouprow,but we refer to specific papersusing the spelling used in connectionwith them. The FundamentalContributions of Fisher and Neymanon Experimentationand Sampling 243 presumablybeen ignorantduring the time he was writing the original paper in Poland, we believe that Neyman would undoubtedlyhave agreed. He had, after all, come to London with the expressed purpose of studying with Pearson, whose Grammarof Science had served as an inspirationsince Neyman discovered it in 1916.

Since Neyman's 1923/1925 paper did not representa conceptual breakthroughin the theory of sampling from finite populations, we are left with two questions of historical interest. The first is to ask how the isolated scholar arrivedat what may well be an independentderivation of some key results in finite sampling and the second is to ask why Neyman's results, in particular,seem to have had a more profoundinfluence than did similar ones obtainedby others.

Perhapsthe answerto the firstquestion is that,while surelyout of the mainstreamof early twentieth centurystatistics, neitherin Russia nor in Poland was Neyman completely isolated. Neyman studied probabilitytheory in Kharkovunder the direction of the Russian mathematicianS.N. Bernstein as early as 1915 or 1916. Although these activities had been forgottenby Neyman by the time he was describing his early career to Constance Reid in 1978 (and perhaps this forgetfulness in his later life contributesto our image of Neyman's early years as totally isolated), in the mid 1920s Neyman describedhimself as having continuedhis studies underBernstein through 1921. He also notes thathe worked in 1920 underBernstein to apply the theory of probabilityto experimentationin agriculture (Reid, 1982, p. 30). ThroughBernstein, Neyman became familiar with Markov'swork, which also influenced Tchouproff(see also Seneta, 1982, 1985). We know that, before he left Poland, Neyman worked with a ProfessorIsseroff on statisticalanalysis of agriculturalexperiments and even lectured to the AgriculturalDepartment at the Universityof Kharkovon the applicationof probabilitytheory to experimentalproblems in agriculture.These activitieswere clearlyprecursors to the papersdealing with experimentaldesign in agricultureand his paper on finite sampling theory which appearedin Biometrikain 1925.

Just as Fisher was stimulatedby the challenges of real experimentationat Rothamstedto gather together ideas that were "in the air" to make his masterly synthesis of the design of experiments and the analysis of variance, so Neyman could well have been stimulated by the challenges of experimentationat the Institute in Poland to synthesize ideas of sampling from finite populations that were "in the air". And that would lead rathernaturally, given the variablecitation practices at the time, to the kind of papersNeyman wrote in 1923, papersthat had little referenceto the work of others. The lack of references to work in English is particularlyunderstandable, since as his advisor certified to Pearson,Neyman had no access to Englishjournals while he was in Poland.

To answer the second question, that of Neyman's profoundinfluence, we need to look not at the 1923 paper (nor at its 1925 republication),but at Neyman's watershed 1934 paper,in which he was able to capturethe essential ingredientsof the problemof sampling,synthesize his own contributions and those of others, and effectively demolish the idea of purposivesampling.

Neyman arrivedin London in 1925, shortly after the publicationof Fisher's Statistical Methods for Research Workers.But the paths of the two seem not to have crossed duringthe entire academic year. In March of 1926, Gosset passed throughLondon and met with Neyman, who expressed an interest in visiting Fisher at Rothamsted.Gosset wrote to Fisher warninghim that Neyman "holds a letter from me to you asking you to show him anything that you think might be useful to him. I daresay offprints would be useful to him if you could spare them as he finds it hard to trace your work. I gatherthat he will write to you in April".Fisher and Neyman actuallymet for the firsttime in July in Rothamsted,although there is no recordof what they discussed. Neyman then spent a year in Paris, after which he returnedto Poland. The next recordof interactionbetween Neyman and Fisher is linked to Neyman's work in 1932 with Egon Pearsonon the theory of statisticaltests. 244 S.E. FIENBERGand J.M. TANUR

5 The 1925 and 1927 ISI Discussions on Sampling Before Neymanreturned to the problemsof samplingfrom a finitepopulation that he had addressed in 1923, major developments in sampling were presented at the 1925 and 1927 sessions of the InternationalStatistical Institute (ISI), and published in 1926 and 1928, respectively. Kruskal & Mosteller (1980) give a related and somewhatmore detailed discussion. A substantialportion of the 1925 ISI Meeting was taken up with discussions of the "methodof representativesampling". In 1924, the Bureau of the ISI appointeda Commission, consisting of ArthurBowley, CorradoGini, Adolph Jensen, Lucien March, Verijn Stuart,and Frantz Zizek, to study the representativemethod. Jensen served as rapporteurand leader of the discussion at the meeting. The Commission Report (Jensen, 1926a) contains a descriptionof two methods: random sampling(with all elements of the populationhaving the same probabilityof selection), andpurposive selection of large groups of units (in modernterminology clusters) chosen to match the population on selected control variates.The reportdoes not really attemptto choose between the methods. In one of several annexes to the report,Bowley (1926) provideda lengthy theoreticaldevelopment, but failed to describefully how the statisticaltheory of purposivesampling works. What is remarkableto us in this Reportand its Annexes, especially in light of the controversythat Isserlis and Greenwood were to ignite only the next year,is the singularlack of rtoreferences the theoreticalwork on sampling from finite populationsby Isserlis, Neyman, and Tchouproff,although the referencelist in Jensen's (1926b) Annex did include one referenceto Tchouproff,his 1910 thesis! Tchouproffwas presentat the meeting but because he was ratherill he appearedto contributelittle to the discussion. He died soon after the meeting. Yates (1946), in reviewing these proceedings, notes the "lack of any clear conception of the possibility, except by the selection of units wholly at random, or by the inadequateprocedure of sub-dividingthe sample into two or more parts,of so designing samplinginquiries that the sampling errorsshould be capable of exact estimationfrom the results of the inquiryitself". (p. 13). He goes on to describethe developmentsin randomsampling that took place in Englandlinked to agricultural experimentationbetween 1925 and 1935, stimulatedlargely by the elements of randomizationin experimentationand the analysis of variance,both of which he attributesto Fisher. We see this as anotherindication of the lack of impact of Neyman's early contributionsin these two interrelated fields. At the 1927 ISI meeting, CorradoGini presented a paper on the application of the purposive methodto the samplingof recordsfrom the 1921 Italiancensus (see Gini, 1928, and Gini & Galvani, 1929), which was to play a pivotal role in the later work by Neyman. They needed to discardmost of the records of the 1921 census before taking the next one, and they proposedto retain a sample for futureanalyses and reference.To make theirsample representative,they chose to retainall of the data from 29 out of 214 large administrativedistricts into which Italy was divided. They applied the purposivemethod in orderto select the 29 througha process which attemptedto match the averages on seven importantcharacteristics with those for the country as a whole. When they did this, they discoveredthat therewere substantialdeviations between the sample and the entirecountry for other characteristics.This, they claimed, called into question the accuracyof sampling. It remained for Neyman to take up the challenge embodiedin this claim.

6 Neyman's 1934 Paper on Sampling Neyman took up the challenge in his classic 1934 paper presentedbefore the Royal Statistical Society, "Onthe two differentaspects of the representativemethod". We review the paper'selements and emphasize how Neyman rescuedclustering from the clutches of purposivesampling and gave it a rightfulplace in the foundationsof randomsampling methodology. Neyman originally preparedthe paperin 1932 in Polish (with an English summary)as a booklet The FundamentalContributions of Fisher and Neymanon Experimentationand Sampling 245

growing out of his practicalexperience. He had been working for the Institutefor Social Problems on a projectinvolving sampling from the Polish census to obtaindata to describe the structureof the working class in Poland. As he wrote to Egon Pearson,he used the opportunityand "pusheda little the theory" (Reid, 1982, p. 105). Publishedin 1933, the original Polish version of the paper traces a good deal of history (Neyman had clearly learnedhis lesson about citation practices and was now unquestionablyfamiliar with the literature);the 1934 version publishedin the Journal of the Royal Statistical Society includes even more history and carefully cites many of the people who were to be in the room for the presentationbefore the Society. Neyman gives enormous credit to Bowley for his 1926 synthesis of the theory underpinningsampling methods, but also notes the hundreds of contributionsin the area on which his comparison between purposive and random sampling builds. There is, however, the curious presentationof optimal allocation with no reference back to Tchouproff(1923a, 1923b). The goal of the paper is to comparepurposive and randomsampling. But elements of synthesis are prominentas well. Neyman describes stratifiedsampling, noting that Bowley considers only the proportionatecase but stating that such restrictionis not necessary.He gives a crisp descriptionof cluster sampling: "Suppose that the population n of M' individuals is grouped into Mo groups. Insteadof consideringthe populationn we may now consideranother population, say n, having for its elements the Mo groups of individuals,into which the populationf is divided .... If there are enormousdifficulties in samplingindividuals at random,these difficultiesmay be greatlydiminished when we adopt groups as the elements of sampling"(pp. 568-569). This is a new synthesis-earlier conceptualizationsof clustering had coupled it with purposive sampling. Indeed, Neyman quotes Bowley as maintainingthat "in purposiveselection the unit is an aggregate,such as a whole district, and the sample is an aggregateof these aggregates,while in randomselection the unit is a person or thing, which may or may not possess an attribute,or with which some measurable quantity is associated"(p. 570). Neyman goes on to explicitly uncouple clustering and purposive sampling, saying, "Infact the circumstancethat the elements of samplingare not humanindividuals, but groups of these individuals,does not necessarily involve a negationof the randomnessof the sampling"(p. 571). He calls this procedure"random sampling by groups"and points out that, although Bowley did not consider it theoretically,he used it in practicein London, as did 0. Andersonin Bulgaria. Neyman also speaksof combiningstratification with clusteringto form "randomstratified sampling by groups".[Bowley, in his role as the lead discussantof Neyman's paper,notes the innovativeness of Neyman's suggestion of randomstratified sampling of groupsand acknowledgesthat in fact it was what he had been drivento use in his work even thoughhe has not fully recognized the implications of what he had done]. Then Neyman refersto the full theoryfor best linearunbiased estimation3 of a populationaverage developed in his 1933 Polish publication,and he gives the now familiarformula for the varianceof the "natural"weighted averageestimator in stratifiedcluster sampling. In order to compare Gini and Galvani's method of purposiveselection with his own method of random stratifiedsampling by groups, Neyman imbues the purposivemethod with a structurethat allows one to treat it as if it were based on a special form of random selection, even though this was clearly not the way Gini and Galvaniactually selected or conceived of their groups (districts)4 3There was an implicit assumptionin Neyman's work regardingthe uniqueness of the best linear unbiased estimate in sampling from finite populations.This assumptionturns out to be false. Godambe& Thompson(1971, pp. 386-387) observe that "Neymanproposed the use of the Gauss-Markofftechnique to prove the optimality(UMV-ness) of the sample mean and othersimilar estimators. Indeed, during the discussion,Fisher concurred in this method;and it appearsthat the Gauss-Markoff now technique, known to be of doubtful validity in sampling theory (Godambe, 1955), was one of the very few points on which Neyman and Fisher were in agreement". 4Neymanenvisions a universeof districtsdivided up into strataaccording to the values of one or more controlvariates, and then each stratumis subdividedinto substrataaccording to the numberof units in the district.Then within each substratum he speaks of the randomselection of a preassignednumber of districts.Of course, because of the large size of the districtsin Gini and Galvani's situation, most stratawould contain zero or one district.This leads to the sampling bias which Neyman then illustrates. 246 S.E. FIENBERGand J.M. TANUR

In doing so, Neyman constructsa sturdycoffin inwhich to burythe method of purposiveselection, and then he proceeds to slam the lid of the coffin tight and nail it shut, by presentingone statistical argumentafter another.First, he considers the conditions underwhich the purposivemethod, which utilizes the regression of group means on a control variatey, producesunbiased estimates. He then notes, on the basis of calculations from Gini and Galvani's own data, that the conditions appear not to be satisfied in practice.Neyman goes further,however, and points out that Gini and Galvani make the implicit assumptionthat the groups (clusters) are themselves random samples from the population,something that is decidedly false. Neyman's own method of randomstratified sampling by groups does not have these deficiencies.Neyman carriesthe argumentone final step by exploring, using his variance formula, whether it is preferableto sample a small numberof large groups (in effect a varianton Gini and Galvani)or a large numberof smallerunits. The latterturns out to be the clear choice and this provides Neyman with the final nail to hammershut the coffin of the purposive method. Shortly later, the coffin was buried,although the ghost of the purposivemethod continues to rise until this day in the form of quota sampling. The immediateeffect of Neyman's paperws to establish the primacyof the methodof stratified random sampling over the method of purposiveselection, something that was left in doubt by the 1925 ISI presentationsby Jensen and Bowley. But the paper'slonger-term importance for sampling was the consequenceof Neyman's wisdom in rescuingclustering from those who were the advocates of purposive sampling and integratingit with stratificationin a synthesis that laid the groundwork for modern-daymultistage probability sampling. The heart of the 1934 paper, however, in terms of the amount of space Neyman devoted to exposition and in terms of emphasis in the discussion by others at the presentationbefore the Royal Statistical Society, is the materialon confidenceintervals. There was much confusion as to whether this was just Fisher's fiducial method presentedin a slightly differentfashion or a new inferential approach.Neyman introducedthe idea of coverage in repeatedsamples, but the discussion did not pick up on its originality.It is somewhatironic thatin his discussionof Neyman (1934), Leon Isserlis, who only seven years earlier had condemnedNeyman for not giving enough credit to Tchouproff, completely overlooked the fact that Neyman failed to credit Tchouproffin this paper,this time for proposing the notion of optimal allocation in stratifiedsampling. We speculate that Isserlis' failure to question this point arose largely from the troublehe was having understandingNeyman's concept of confidence intervals,the latterbeing the sole topic of Isserlis' discussion. Two decades later Neyman (1952a) acknowledged Tschouproff's priority in discovering these results:

"I am obliged to Dr. Donovan J. Thompsonof the StatisticalLaboratory, Iowa State College, Ames, Iowa, for calling my attentionto the article of A.A. Tschuprow,"On the mathematicalexpectation of the moments of frequencydistributions in the case of correlatedobservations" published in Metron,Vol. 2, No. 4 (1923), pp. 646-680, which contains some resultsrefound by me and published,without reference to Tschuprow,in 1933. The results in question are the general formulafor the varianceof the estimate of a mean in stratifiedsampling and the formuladetermining the optimumstratification of the sample. These formulaeappeared first in a Polish bookletAn Outlineof the Theory and Practice of RepresentativeMethod, Applied in Social Researchpublished in 1933 by the WarsawInstitute of Social Problems.Later on they were republishedin English in the Journal of the Royal StatisticalSociety, Vol. 97 (1934), pp. 558-625. Finally,the same formulae, again without a reference to Professor Tschuprow,were given in the second edition of my book, Lecturesand Conferenceson MathematicalStatistics and Probability,Washington, D.C., 1952. The purpose of this note is, then, to recognize the priorityof ProfessorTschuprow, The FundamentalContributions of Fisher and Neymanon Experimentationand Sampling 247

to express my regretfor overlooking his results and to thankDr. Thompson for calling my attentionto the oversight." And in the 1970s, Neyman continued in this apologetic vein by reprintingthe recognition of priorityin an introductionto a book of correspondencebetween Markovand Tschouproff(Neyman, 1981). What seems clear today for those of us who go back for Tschouproff(1923a, 1923b) is that if Neyman had read these papers, it would have been difficult to have missed Tschouproff'streatment of stratificationalthough we confess to having to search to identify the result on optimal allocation. Thus, the simplest interpretationfor us is thatwhen Neyman was developingthe materialfor the 1934 paper in Poland he not only did not have easy access to Tschouproff'spaper, but simply presumed that everyone would accept the notion of stratification,especially in light of the ISI discussion and publicationson the topic. Fisher's lenghty discussion of the Neyman paperis noteworthyin the presentcontext, not so much for his challenge that Neyman's confidence intervalswere in some senses fiducial intervalsunder a differentname, butrather for his observationson the importanceof the parallelismbetween sampling in economic researchand the role of sampling of finite populationsin agriculturalexperimentation. The majordistinction, he was reportedas saying, was that:"In a well-designed experiment,however, the mathematics were simplified, and all anxiety was avoided in respect to different systems of weighting" (p. 616). Of course, Fisher did recognize that Neyman's goal in creating confidence intervals was quite different from his own goal of inductive inferences from the sample at hand. As Edwards (1995) remindsus, Fisherexpessed his skepticismregarding the value of the coverageproperty of confidence intervalsand he remarkedin his discussion that Neyman's generalization"was wide and handsome, but it had been erected at considerableexpense, and it was perhapsas well to count the cost" (p. 618). Our question of why the Neyman paper had such a profoundinfluence compared to the earlier work of Tchouproffand others,was also raisedby Bellhouse (1988) and Kruskal& Mosteller (1980). We believe that while Tchouproff had clearly derived a numberof the technical results a decade earlier,and had droppedthe constraintof constantprobabilities of selection, his paperswere abstract and formal in nature,and his results were far removed from real-worldapplication. Neyman more clearly laid the groundworkfor statistical practice by his innovative integrationof clustering and stratification,and his clear and convincing exposition of the inferiority of the purposive method. Neyman provided the recipe for others to follow and he continuedto explain its use in convincing detail to those who were eager to make randomsampling a standarddiet for practicalconsumption (e.g., see Neyman, 1952b, for a description based on his 1937 lectures on the topic at the U.S. Departmentof AgricultureGraduate School).

7 Fisher, Neyman, and The Design of Experiments in 1935 The year after Neyman presentedhis sampling paper to the Royal Statistical Society, three pub- lications resolved all doubt about who was to gain full credit for the design of experiments as an areaof statistics. To understandthese events surroundingthese publications,however, we move back several years and trace the interlockingdevelopments in this area involving Fisher and Neyman. Following Fisher's series of papers on experimentaldesign in the 1920s, those at Rothamsted such as Yates and Wishart helped to propagatehis ideas and move them into actual agricultural practice, and Fisher temporarilyturned his focus to the completion of The Genetical Theory of Natural Selection, publishedin 1930, as well as relatedissues of genetics and eugenics and his ideas on inverse probability.But Fisher and others recognized the need to pull together the new ideas on statistics and experimentation.In 1930 he lectured on the topic at ImperialCollege in London and at Chelsea Polytechnic and then he spent the summer of 1931 at Iowa State University giving 248 S.E. FIENBERGand J.M. TANUR some related lectures. This was to be the beginning of notes that ultimately led to The Design of Experiments,in 1935. Meanwhile, Neyman was workingon a series of paperson hypothesis testing with Egon Pearson,and on some agriculturalexperimentation problems. By the late 1920s, Fisher was becoming restless at Rothamsted and considered applying for positions at Britishuniversities (Box, 1978, pp. 202-203). Box (1978) and Reid (1982) describesome of the correspondencebetween Fisher and Neyman in 1932 and 1933, primarilyabout Neyman's paperwith Egon Pearsonon tests of hypotheses.When KarlPearson retired from UniversityCollege in 1933, both his position and the departmentwere split in two, with Egon Pearsonbeing appointed as Reader in Statistics and head of the Departmentof Applied Statistics and Fisher being given the position of Galton Professorof Eugenics and responsibilityfor the Galton Laboratory.A major issue between the two was responsibility for teaching of statistics. Pearson proposed that Fisher not teach statisticaltheory, and so Fisherresponded by proposingto teach the logic and philosophy of experimentation(Box, 1978, p. 259 and Bennett, 1990, p. 192). Here we see anotherimpetus for Fisher to complete the book on experimentaldesign. In mid 1933, upon hearing of Fisher's appointmentat UniversityCollege, Neyman wrote to him asking about the possibility of a position in the Laboratory.But Fisher had nothingto offer specifically in statisticsand, in 1934, Neyman left Poland andjoined Pearson'sdepartment for a threemonth visiting position, which ultimatelyturned into a permanentntone. It was also at about this time that Fisher nominatedNeyman for membership in the InternationalStatistical Institute. That first termback in London,Neyman lecturedon interval estimationand Fisher on experimentation,with Neyman in attendance. In Marchof 1935, Neymanpresented a paperbefore the recentlyformed Industrial and Agricultural Research Section of the Royal StatisticalSociety (Neyman, 1935) entitled "Statisticalproblems in agriculturalexperimentation". The paper was based on work done in Poland, in co-operationwith two junior colleagues, Iwasziewicz and Kolodziejczyk.In it, he presentedformal statisticalmodels for the analysis of randomizedblock anLatinand squaredesigns, and he claimed that "the application of the usual tests of significanceto the resultsof experimentsby the two methodsis not yet rigorously justified, thoug thteeffect of the inaccuracyinvolved may be negligible". In other words (but not those explicitly used by Neyman), Fisher's claims about the validity of tests were wrong! Neyman gives the model explicitly; treatmentsaffect individualplots in a row-by-columnlayout differently (i.e., unit-treatmentnon-additivity). The model and Neyman's discussion of it includes the notion of an infinite series of repetitionsof the experiment,and it is with respect to this long-rundistribution that he takes expectations. In an appendix, he shows that, if the average treatmenteffect over the entire experimentalarea is the same for all treatments,the usual estimate of the error variance is unbiased for randomizedblocks whereas for his Latin squaremodel there is a bias. He repeats the opening claim that this bias may not be substantial.This means that the F test (referredto by both Fisher and Neyman as the z test) for treatmenteffects in a Latin square "may ... cause a tendency to state significant differentiationwhen this, in fact, does not exist" (p. 154). Then Neyman goes on to examine the test of hypothesis that one treatmenthas a greater effect than another and he demonstrates,in a series of examples, that Latin squares are often less efficient than randomized blocks. Throughoutthe paper Neyman employed the notions of type I and type II errors he had developed with Pearson. Fisher was the lead discussantof the paperand he respondedwith greatsarcasm about Neyman's surprisingresults noting that those present"had to thankhim, not only for bringingthese discoveries to their notice, but also for concealing them from public knowledge until such time as the method should be widely adopted in practice!"(p.155). Fisher questioned Neyman's apparentinability to grasp the very simple argumentby which the unbiased characterof the test of significance might be demonstratedand then he presenteda simple illustrationin the context of a 5 x 5 Latin square. Here Fisher states ratherobliquely one of his key differences with Neyman, the hypothesis to be tested, which in the Fisherianformulation is that treatmentshave no effect on yields whereas in the The FundamentalContributions of Fisher and Neymanon Experimentationand Sampling 249

Neyman formulationit relates to zero treatmenteffects averagedover all plots. Then Fisher invokes the randomizationdistribution, which in effect assumes unit-treatmentadditivity. Yates and Wishart also contributedto the discussion in a more polite, but just as critical fashion, and when Neyman attemptedto respondbriefly, Fisher interruptedwith the final statement: "Ithink it is clear to everyonepresent that Dr. Neyman has misunderstoodthe intention- clearly and frequentlystated-of the z test and of the Latin Squareand other techniques designed to be used with that test. Dr. Neyman thinks that anothertest would be more important.I am not going to arguethat point. It may be thatthe questionthat Dr. Neyman thinksshould be answeredis more importantthan the one I have proposedand attempted to answer.I suggest thatbefore criticizingprevious work it is always wise to give enough study to the subject to understandits purpose. Failing that it is surely quite unusualto claim to understandthe purposeof previous work betterthan its author. From a purely statisticalstandpoint the position is thatDr. Neyman would like a different test to be made, and I hope he will invent a test of significance, and a method of experimentation,which will be as accuratefor questions he considers to be important as the Latin squareis for the purposefor which it was designed." Neyman later respondedat length in writingbut his relationwith Fisherhad been shatteredand most present at the presentationof the paperseemed to side with Fisher.The controversythat began with this exchange raged throughoutFisher's lifetime, and it included not only fundamentalaspects of experimentaldesign but also two very differentapproaches to testimony and statisticalinference. The dispute that erupted over the Neyman paper was bound up in issues of estimation and hypothesis testing and the difference between the two types of inference. Even with Neyman's model, the usual F-test in the Fisherianapproach has the correctnull distributionwhen there is "no treatmenteffect", because then there is no issue of whether treatment-unitadditivity holds. Fisher correctly pointed this out in the discussion. Neyman was insisting on a different null hypothesis, that the treatmenteffect, through its interactionwith plots, averages out over the complete set of plots used in the experiment.This is essentially estimatingthe treatmenteffect in the non-null case, however,and here it matterswhether one allows for treatment-unitadditivity or not. Fisherdid allude to this difference,but in languagethat remains difficult to decipher.(Rhetoric was at a premiumrather than statistical clarity.)Nelder (1994) essentially argues that Neyman's errorwas in formulatingan uninterestingnull hypothesis, largely as a result of the non-hierarchicalnature of its construction.In such circumstancesone also needs to look at the alternatives.Thus Nelder arguesthat Fisher's attack on Neyman was irrelevantbecause he dealt only with the null case. Further,the Neyman/Pearson approachto confidence intervals and tests of hypotheses, which Fisher was never to accept, was essential to the Neyman argumentin this paper; the ideas were still too new for most present to understand.Finally, we note that randomizationseemed to play little role in Neyman's argumentsin the 1935 paper whereas it was essential to Fisher's ideas, clearly structuringall of his thinking on models and the validity of related statisticalmethods for the analysis of experiments. Just two months after Neyman's presentation,Fisher's successor at Rothamsted, (1935), presentedhis own paperon agriculturalexperimentation before the same section of the Royal Statistical Society and the debate between Fisher and Neyman continued,with Yates becoming the target for Neyman's criticism. In a lengthy critique of Yates' exposition of the theory of factorial experimentation,submitted in writing after the meeting, Neyman again used his work with Pearson on tests of hypotheses to suggest that interactionshad low power of detection and that Yates' (and thus Fisher's) definition of interactionswere fatally flawed. This discussion seems to have confused the natureof the contrastsused to estimate effects in the Fisher-Yatesframework and Yates avoided the possibility of others being similarly confused by altering slightly the definitions in the printed version of his paper (which followed Neyman's paperin the Journal).This minor alterationvitiated 250 S.E. FIENBERGand J.M. TANUR all of Neyman's analysis. Yates also respondedrather caustically in writingto Neyman's critiqueand Neyman then submittedan addendumpointing out why he still thoughtthe Fisher-Yates approach was wrong. The controversybetween Neyman and Fisher (with Yates as an occasional surrogate) was now full-blown. The final of the trio of 1935 publicationson experimentation,Fisher's The Design of Experiments, was the most significant.Fisher completedthe preface the month afterYates presentedhis paper.In this book, Fisher presentedan integratedperspective on the logic and principlesof experimentation. After a brief introductionin which he explains why for inferences he does not use Bayes' theorem in the book, Fisher turns to his famous example of "", based on an actual experimenthe proposedmore than 12 years earlierat Rothamsted,in responseto the claim by Muriel Bristol, an algologist, who claimed that she could tell the differencebetween a cup of tea into which the milk had been poured first (her preference) and one into which the tea had been poured first (which Fisher had offered her). Here Fishermade his strongestarguments for the need to randomize. In the next chapter he dealt with Darwin's data on cross- and self-fertilization (in which there was no randomization),introducing the notion of matching,and this time he used the argumentof randomizationto "justify"the use of the t-distributionfor the key comparison.Subsequent chapters dealt with randomizedblocks, Latin squares, factorial designs, confounding, partialconfounding, and the analysis of covariance. The final pair of chaptersdealt with more theoreticalmaterial on estimation:fiducial inference and the measurementof information. Few of the ideas in the book were totally new, but their integrationand the power of Fisher's exposition of them helped to establishrandomized experimentation as basic to the scientist's toolkit. The battle over randomizationcontinued to be waged in British journals, with Gosset, who still advocated the use of systematic designs such as Beavan's drill strip or sandwich design, being Fisher's main antagonist.The debate essentially ended with Gosset's deathin 1937, althougha final critiqueby Gosset appearedposthumously in Biometrikain 1938. On the other side of the Atlantic, there was no such battle, and in 1936, on the occasion of its 300th anniversary,Harvard University awardedFisher an honorarydegree at least in partfor this achievement.Just as with Neyman and his 1934 paper on sampling, Fisher's The Design of Experimentsprovided a recipe for others to follow.

8 Conclusions

Both Fisherand Neyman made fundamentalcontributions to statisticalmethodology that underpins the modern approachesto the design of experimentsand the design of sample surveys. In this paper we have retracedtheir work on these topics, especially duringthe criticalyears from 1921 to 1935, and we attemptto explainthe pivotalroles of Neyman's 1934 paperon agriculturalexperimentation which ultimatelyled to a rift between the two men that was neverto be repaired.In the aftermath,Neyman staked out intellectualresponsibility for samplingand Fisher did the same for experimentation.The two fields subsequentlydrifted apart. Two issues were raised in the discussion of an earlierversion of this paperpresented at the 50th ISI Session in Beijing, China. In this discussion, others challenged us regardingthe real reason for the separationof the design of experimentsand sample surveysafter 1935, and regardingthe original of Neyman's contributionsto sampling.We comment on each briefly. Some have argued that it was the fundamentaldifferences between experimentationand survey work that were truly responsible for the separationof the two fields within statistics and not the fued between Fisher and Neyman. For example,T.F.M. Smith (personalcommunication) argues that randomizationin an experimentsupports internal validity, to the collection of experimentalunits, whereas random selection in a survey setting supports external validity, allowing generalization to those not in the sample. This is technically correct, but only in a narrowsense. The statistical tools used in the two fields are essentially the same and the real scientific goals of an experimenter The FundamentalContributions of Fisher and Neymanon Experimentationand Sampling 251 involve generalizationbeyond the experimentalunits. This is exactly why we have emphasized the importanceof experimentsembedded in surveys and surveys embedded in experiments (Fienberg & Tanur,1988,1989). The recent interestin the common featuresof experimentationand sampling suggests that the technical links are important,as many of the early contributorsrecognized in the 1930s. Thus we do not believe that the differencesin perspectivebetween experimentationas surveys alone can account for the separation.The clash of two strong personalitieswho were determinedto gain recognition for their inferentialapproaches, in our view, was a majorcontribution to the split. Otherscontinue to raise the issue of the extentto which Neyman "stoodon the shouldersof giants". In particular,Chris Heyde (personal communication)raises the question of Neyman's debt to the Russian school of sampling led by Tschouproff.It is true that Neyman was educatedin Karkovand studied probabilitywith Bernstein, as we note above, but he appearsto have had little or no contact with statistics and the sampling of finite populations. Thus, as we argued above, we believe that the contributionsin Neyman's 1923 paperrepresent independentinen discovery. In the 1925 Biometrika version there is a reference to an earlier Biometrikapaper by Tschouproff,but no recognition of Tschouproff's 1923 Metronpaper. This appearssomewhat sloppy and even in 1934, Neyman did not single out Tschouprofffor recognition,either in generalor in connectionwith optimalallocation. But Neyman's purpose in 1934 was differentand he broughta new integrativeperspective to sampling in the paperthat differed from those who wrote on the topic previously. Thus while we believe that Neyman was influenced by the work of many others-few of whom he referenced-he broughtoriginality to his paperson samplingand the clarity of his exposition was crucial to the subsequentdevelopment of sampling. As a result, he rightly became one of the giants on whose shoulders others have stood. We continue to see importantlinks between the fields of experimentationand sample surveys and we find their roots in the pioneering work of both Fisher and Neyman, during a time of great intellectual ferment in the 1920s and 1930s. We believe that the statisticalprofession has much to learn from their creativeideas.

Acknowledgments A preliminaryversion of parts of this paper dealing with Neyman was presentedat the "Interna- tional Conference on Statistics to Commemoratethe 100th Anniversaryof Jerzy Neyman's Birth", Jarchranka,Poland, November 25-26, 1994, and appearedas Fienberg& Tanur(1995a). The present paperhas addedconsiderably to the materialdrawn from thatearlier one. We thankA.W.F. Edwards, Chris Heyde, , Eugene Seneta and Fred Smith for comments on an earlier version of the present paper, which was presented at the 50th ISI Session in Beijing, China in August 1995 (Fienberg& Tanur1995b).

References

Bellhouse, D.R. (1988). A brief history of randomsampling methods. In Handbookof Statistics.Eds. P.R. Krishnaiah& C.R. Rao, Vol. 6. North Holland, Amsterdam,pp. 1-14. Bowley, A.L. (1926). Measurementof the precision attainedin sampling. (Annex A to the Reportby Jensen). Bulletin of the InternationalStatistical Institute,22, Supplementto Liv. 1: 1-62. Bennett,J.H., ed. (1990). Statistical InferenceandAnalysis: Selected Correspondenceof R.A.Fisher. Oxford UniversityPress, Oxford. Box, J.F. (1978). R.A. Fisher, The Life of a Scientist. Wiley: New York. Box, J.F. (1980). R.A. Fisher and the Design of Experiments, 1922-1926. AmericanStatistician, 34, 1-7. Church, A.E.R. (1926). On the moments of the distributionof squared standarddeviations of small samples from any population.Biometrika, 18, 321-394. Cochran, W.G. (1980). Fisher and the analysis of variance. In R.A. Fisher: An Appreciation.Eds. S.E. Fienberg and D.V. Hinkley. Springer-Verlag:New York,pp. 17-34. Eden, T. & Fisher, R.A. (1927). Studies in crop variation.IV. The experimentaldetermination of the value of top dressings with cereals. Journal ofAgricultural Science, 17, 548-562. 252 S.E. FIENBERGand J.M. TANUR

Edwards,A.W.F. (1995). XVIII-thFisher MemorialLecture delivered at the NationalHistory Museum, London on Thursday, 20th October, 1994: Fiducial Inferenceand the FundamentalTheorem of NaturalSelection. Biometrics, 51, 799-809. Fienberg, S.E. & Tanur,J.M. (1987). Experimentaland sampling structures:Parallels diverging and meeting. International Statistical Review, 55, 75-96. Fienberg,S.E. & Tanur,J.M. (1988). Fromthe inside out and the outside in: Combiningexperimental and samplingstructures. CanadianJournal of Statistics, 16, 135-151. Fienberg, S.E. & Tanur,J.M. (1989). Combiningcognitive and statisticalapproaches to survey design. Science, 243, 1017- 1022. Fienberg, S.E. & Tanur,J.M. (1995a). ReconsideringNeyman on experimentationand sampling:Controversies and funda- mental contributions.Probability and MathematicalStatistics, 15, 47-60. Fienberg,S.E. & Tanur,J.M. (1995b). Reconsideringthe fundamentalcontributions of Fisherand Neyman on experimentation and sampling. Bulletin of the InternationalStatistical Institute,50th Session, InvitedPapers Book 3, 1243-1260. Fisher,R.A. (1918). The correlationbetween relativeson the suppositionof Mendelianinheritance. Transactions of the Royal Society of Edinburgh,52, 399-433. Fisher, R.A. (1921). Studies in crop variation.I. An examinationof the yield of dressed grain from Broadbalk.Journal of AgriculturalScience, 11, 107-135. Fisher, R.A. (1925). Statistical Methodsfor ResearchWorkers. Oliver and Boyd, Edinburgh. Fisher, R.A. (1926). The arrangementof field experiments.Journal of the Ministryof Agriculture,33, 503-513. Fisher, R.A. (1935). The Design of Experiments.Oliver and Boyd, Edinburgh. Fisher, R.A. (1952). Statisticalmethods in genetics. Heredity,6, 1-12. Fisher, R.A. & Eden, T. (1927). Studies in crop variation.II. The experimentaldetermination of the value of top dressings with cereals. Journal of AgriculturalScience, 17, 548-562. Fisher, R.A. & Mackenzie, W.A. (1923). Studies in crop variation.II. The manurialresponse of differentpotato varieties. Journal of AgriculturalScience, 13, 311-320. Gini, C. (1928). Une applicationde la methoderepresentative aux materiauxdu dernierrecensement de la populationitalienne (ler decembre 1921). Bulletin of the InternationalStatistical Institute, 23, Liv. 2, 198-215. Gini, C. & Galvani,L. (1929). Di unaapplicazionedel metodorappresentative all'ultimo censimento Italiano dellapopolazione (1? decembri, 1921). Annali di Statistica,Series 6, 4, 1-107. Godambe,V.P. (1955). A unifiedtheory of samplingfrom finite population.Journal of the Royal Statistical Society,Series B, 17, 269-278. Godambe,V.P. & Thompson,M.E. (1971). Bayes, fiducialand frequencyaspects of statisticalinference in regressionanalysis in survey sampling (with discussion). Journalof the Royal StatisticalSociety, Series B, 17, 361-390. Greenwood, M. & Isserlis, L. (1927). A historical note on the problem of small samples. Journal of the Royal Statistical Society, 90, 347-352. Isserlis, L. (1918). Formulaefor determiningthe mean values of productsof deviationsof mixed moment coefficients in two to eight variablesin samples taken from a limited population.Biometrika, 12, 183-184. Jensen, A. (1926a). Reporton the representativemethod in statistics.Bulletin of the InternationalStatistical Institute, 22, Liv. 1, 359-380. Jensen, A. (1926b). The representativemethod in practice.(Annex B to the Reportby Jensen). Bulletin of the International Statistical Institute,22, Liv. 1, 381-439. Kruskal, W.H. & Mosteller, F. (1980). Representativesampling, IV: The history of the concept in statistics, 1895-1939. InternationalStatistical Review,48, 169-195. Lehmann,E.L. & Reid, C. (1982). In Memoriam,Jerzy Neyman 1894-1981. AmericanStatistician, 36, 161-162. Nelder, J.A. (1994). The statisticsof linear models: back to basics. Statisticsand Computing,4, 221-234. Neyman, J. (1933). Zarys teorii ipraktykibandania struktury ludnosci metoda reprezentacyjna (An Outlineof the Theoryand Practice of RepresentativeMethod, Applied in Social Research).Institute for Social Problems,Warsaw. (In Polish with an English summary). Neyman, J. (1934). On two differentaspects of the representativemethod: the method of stratifiedsampling and the method of purposiveselection (with discussion). Journal of the Royal StatisticalSociety, 97, 558-625. Neyman, J. (1952a). Recognitionof priority.Journal of the Royal StatisticalSociety, 115, 602. Neyman, J. (1952b [1938]). Lectures and Conferences on MathematicalStatistics and Probability. U.S. Departmentof Agriculture,Washington, DC. (The 1952 edition is an expandedand revisedversion of the original 1938 mimeographed edition). Neyman, J. (1981). Introductionto The CorrespondenceBetween A.A. Markovand A.A. Chuprovon the Theoryof Probability and MathematicalStatistics, (Kh. O. Ondar,ed.), Springer-Verlag,New York. Neyman, J., with the cooperation of K. Iwaszkiewicz and St. Kolodziejczyk. (1935). Statistical problems in agricultural experimentation(with discussion). Supplementto the Journalof the Royal StatisticalSociety, 2, 107-180. Pearson,K. (1927). Another"historical note on the problemof small samples".(Editorial) Biometrika, 19, 207-210. Reid, C. (1982). Neymanfrom Life. Springer-Verlag,New York. Rubin, D.B. (1990). Comment: Neyman (1923) and causal inference in experimentsand observationalstudies. Statistical Science, 5, 472-480. Russell, E.J. (1926). Field experiments:How they are made and what they are. Journal of the Ministryof Agriculture,32, 989-1001. Seneta, E. (1982). Chuprov (or Tschuprow),Alexander Alexandrovich.In Encyclopedia of Statistical Sciences, Vol. 1 (S. Kotz and N. Johnson,eds.). Wiley, New York,477-479. Seneta, E. (1985). A sketch of the history of survey sampling in Russia. Journal of the Royal Statistical Society, Series A, The FundamentalContributions of Fisher and Neymanon Experimentationand Sampling 253

148, 118-125. Splawa-Neyman,J. (1925[1923b])Contributions of thetheory of smallsamples drawn from a finitepopulation. Biometrika, 17, 472-479. (Thenote on this republicationreads "These results with others were originally published in La Revue Mensuellede Statistique,publ. parl'Office Central de Statistiquede la RepubliquePolonaise, tom. vi. pp. 1-29, 1923"). Splawa-Neyman,J. (1990[1923a]). On the application of probabilitytheory to agriculturalexperiments. Essay on principles. Section9. StatisticalScience, 5, 465-472. Translatedand edited by D.M.Dabrowska & T.P.Speed from the Polish original, which appearedin RocznikiNauk Rolniczyc,Tom X (1923): 1-51 (Annals of AgriculturalSciences). 'Student'(1923). On testing varieties of cereals.Biometrika, 15, 271-293. 'Student'(1938). Comparison between balanced and rand arrangements of fieldplots. Biometrika, 29, 363-369. Tschuprow,Al.A. (1923a). On the mathematical expectation of themoments of frequencydistributions in thecase of correlated observations(Chapters I-III). Metron, 2, 461-493. Tschuprow,Al.A. (1923b). On the mathematicalexpectation of the momentsof frequencydistributions in the case of correlated observations(Chapters IV-VI). Metron,2, 646-680. Yates, F. (1935). Complex experimentation(with discussion). Supplementto the Journal of the Royal Statistical Society, 2, 181-247. Yates, F. (1946). A review of recent developmentsin sampling and sample surveys (with discussion). Journal of the Royal Statistical Society, 109, 12-42.

Resume R.A. Fisherand JerzyNeyman sont bien reconnuscomme les statisticiensqui ont etabliles idees fondamentalesqui soutiennentle plandes experienceset le plandes enguetespar sondage, respectivement. Dans cet articlenous revoyons les contributionscentrales de ces hommesfameux dans les deuxdomaines de recherche.Nous adressonsaussi l'effet d'une controversequi a rapporta l'articlede Neyman(1935) au sujetde l'experimentationagronome qui a aboutia uneseparation dansle champsdes recherchesdes experiences et des6chantillonnages.

[ReceivedMarch 1996, accepted July 1996]