Yates and Contingency Tables: 75 Years Later
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
F:\RSS\Me\Society's Mathemarica
School of Social Sciences Economics Division University of Southampton Southampton SO17 1BJ, UK Discussion Papers in Economics and Econometrics Mathematics in the Statistical Society 1883-1933 John Aldrich No. 0919 This paper is available on our website http://www.southampton.ac.uk/socsci/economics/research/papers ISSN 0966-4246 Mathematics in the Statistical Society 1883-1933* John Aldrich Economics Division School of Social Sciences University of Southampton Southampton SO17 1BJ UK e-mail: [email protected] Abstract This paper considers the place of mathematical methods based on probability in the work of the London (later Royal) Statistical Society in the half-century 1883-1933. The end-points are chosen because mathematical work started to appear regularly in 1883 and 1933 saw the formation of the Industrial and Agricultural Research Section– to promote these particular applications was to encourage mathematical methods. In the period three movements are distinguished, associated with major figures in the history of mathematical statistics–F. Y. Edgeworth, Karl Pearson and R. A. Fisher. The first two movements were based on the conviction that the use of mathematical methods could transform the way the Society did its traditional work in economic/social statistics while the third movement was associated with an enlargement in the scope of statistics. The study tries to synthesise research based on the Society’s archives with research on the wider history of statistics. Key names : Arthur Bowley, F. Y. Edgeworth, R. A. Fisher, Egon Pearson, Karl Pearson, Ernest Snow, John Wishart, G. Udny Yule. Keywords : History of Statistics, Royal Statistical Society, mathematical methods. -
Contingency Tables Are Eaten by Large Birds of Prey
Case Study Case Study Example 9.3 beginning on page 213 of the text describes an experiment in which fish are placed in a large tank for a period of time and some Contingency Tables are eaten by large birds of prey. The fish are categorized by their level of parasitic infection, either uninfected, lightly infected, or highly infected. It is to the parasites' advantage to be in a fish that is eaten, as this provides Bret Hanlon and Bret Larget an opportunity to infect the bird in the parasites' next stage of life. The observed proportions of fish eaten are quite different among the categories. Department of Statistics University of Wisconsin|Madison Uninfected Lightly Infected Highly Infected Total October 4{6, 2011 Eaten 1 10 37 48 Not eaten 49 35 9 93 Total 50 45 46 141 The proportions of eaten fish are, respectively, 1=50 = 0:02, 10=45 = 0:222, and 37=46 = 0:804. Contingency Tables 1 / 56 Contingency Tables Case Study Infected Fish and Predation 2 / 56 Stacked Bar Graph Graphing Tabled Counts Eaten Not eaten 50 40 A stacked bar graph shows: I the sample sizes in each sample; and I the number of observations of each type within each sample. 30 This plot makes it easy to compare sample sizes among samples and 20 counts within samples, but the comparison of estimates of conditional Frequency probabilities among samples is less clear. 10 0 Uninfected Lightly Infected Highly Infected Contingency Tables Case Study Graphics 3 / 56 Contingency Tables Case Study Graphics 4 / 56 Mosaic Plot Mosaic Plot Eaten Not eaten 1.0 0.8 A mosaic plot replaces absolute frequencies (counts) with relative frequencies within each sample. -
Use of Chi-Square Statistics
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site. Copyright 2008, The Johns Hopkins University and Marie Diener-West. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed. Use of the Chi-Square Statistic Marie Diener-West, PhD Johns Hopkins University Section A Use of the Chi-Square Statistic in a Test of Association Between a Risk Factor and a Disease The Chi-Square ( X2) Statistic Categorical data may be displayed in contingency tables The chi-square statistic compares the observed count in each table cell to the count which would be expected under the assumption of no association between the row and column classifications The chi-square statistic may be used to test the hypothesis of no association between two or more groups, populations, or criteria Observed counts are compared to expected counts 4 Displaying Data in a Contingency Table Criterion 2 Criterion 1 1 2 3 . C Total . 1 n11 n12 n13 n1c r1 2 n21 n22 n23 . n2c r2 . 3 n31 . r nr1 nrc rr Total c1 c2 cc n 5 Chi-Square Test Statistic -
Knowledge and Human Capital As Sustainable Competitive Advantage in Human Resource Management
sustainability Article Knowledge and Human Capital as Sustainable Competitive Advantage in Human Resource Management Miloš Hitka 1 , Alžbeta Kucharˇcíková 2, Peter Štarcho ˇn 3 , Žaneta Balážová 1,*, Michal Lukáˇc 4 and Zdenko Stacho 5 1 Faculty of Wood Sciences and Technology, Technical University in Zvolen, T. G. Masaryka 24, 960 01 Zvolen, Slovakia; [email protected] 2 Faculty of Management Science and Informatics, University of Žilina, Univerzitná 8215/1, 010 26 Žilina, Slovakia; [email protected] 3 Faculty of Management, Comenius University in Bratislava, Odbojárov 10, P.O. BOX 95, 82005 Bratislava, Slovakia; [email protected] 4 Faculty of Social Sciences, University of SS. Cyril and Methodius in Trnava, Buˇcianska4/A, 917 01 Trnava, Slovakia; [email protected] 5 Institut of Civil Society, University of SS. Cyril and Methodius in Trnava, Buˇcianska4/A, 917 01 Trnava, Slovakia; [email protected] * Correspondence: [email protected]; Tel.: +421-45-520-6189 Received: 2 August 2019; Accepted: 9 September 2019; Published: 12 September 2019 Abstract: The ability to do business successfully and to stay on the market is a unique feature of each company ensured by highly engaged and high-quality employees. Therefore, innovative leaders able to manage, motivate, and encourage other employees can be a great competitive advantage of an enterprise. Knowledge of important personality factors regarding leadership, incentives and stimulus, systematic assessment, and subsequent motivation factors are parts of human capital and essential conditions for effective development of its potential. Familiarity with various ways to motivate leaders and their implementation in practice are important for improving the work performance and reaching business goals. -
Pearson-Fisher Chi-Square Statistic Revisited
Information 2011 , 2, 528-545; doi:10.3390/info2030528 OPEN ACCESS information ISSN 2078-2489 www.mdpi.com/journal/information Communication Pearson-Fisher Chi-Square Statistic Revisited Sorana D. Bolboac ă 1, Lorentz Jäntschi 2,*, Adriana F. Sestra ş 2,3 , Radu E. Sestra ş 2 and Doru C. Pamfil 2 1 “Iuliu Ha ţieganu” University of Medicine and Pharmacy Cluj-Napoca, 6 Louis Pasteur, Cluj-Napoca 400349, Romania; E-Mail: [email protected] 2 University of Agricultural Sciences and Veterinary Medicine Cluj-Napoca, 3-5 M ănăş tur, Cluj-Napoca 400372, Romania; E-Mails: [email protected] (A.F.S.); [email protected] (R.E.S.); [email protected] (D.C.P.) 3 Fruit Research Station, 3-5 Horticultorilor, Cluj-Napoca 400454, Romania * Author to whom correspondence should be addressed; E-Mail: [email protected]; Tel: +4-0264-401-775; Fax: +4-0264-401-768. Received: 22 July 2011; in revised form: 20 August 2011 / Accepted: 8 September 2011 / Published: 15 September 2011 Abstract: The Chi-Square test (χ2 test) is a family of tests based on a series of assumptions and is frequently used in the statistical analysis of experimental data. The aim of our paper was to present solutions to common problems when applying the Chi-square tests for testing goodness-of-fit, homogeneity and independence. The main characteristics of these three tests are presented along with various problems related to their application. The main problems identified in the application of the goodness-of-fit test were as follows: defining the frequency classes, calculating the X2 statistic, and applying the χ2 test. -
Measures of Association for Contingency Tables
Newsom Psy 525/625 Categorical Data Analysis, Spring 2021 1 Measures of Association for Contingency Tables The Pearson chi-squared statistic and related significance tests provide only part of the story of contingency table results. Much more can be gleaned from contingency tables than just whether the results are different from what would be expected due to chance (Kline, 2013). For many data sets, the sample size will be large enough that even small departures from expected frequencies will be significant. And, for other data sets, we may have low power to detect significance. We therefore need to know more about the strength of the magnitude of the difference between the groups or the strength of the relationship between the two variables. Phi The most common measure of magnitude of effect for two binary variables is the phi coefficient. Phi can take on values between -1.0 and 1.0, with 0.0 representing complete independence and -1.0 or 1.0 representing a perfect association. In probability distribution terms, the joint probabilities for the cells will be equal to the product of their respective marginal probabilities, Pn( ij ) = Pn( i++) Pn( j ) , only if the two variables are independent. The formula for phi is often given in terms of a shortcut notation for the frequencies in the four cells, called the fourfold table. Azen and Walker Notation Fourfold table notation n11 n12 A B n21 n22 C D The equation for computing phi is a fairly simple function of the cell frequencies, with a cross- 1 multiplication and subtraction of the two sets of diagonal cells in the numerator. -
WILLIAM GEMMELL COCHRAN July 15, 1909-March29, 1980
NATIONAL ACADEMY OF SCIENCES WILLIAM GEMMELL C OCHRAN 1909—1980 A Biographical Memoir by MORRIS HANSEN AND FREDERICK MOSTELLER Any opinions expressed in this memoir are those of the author(s) and do not necessarily reflect the views of the National Academy of Sciences. Biographical Memoir COPYRIGHT 1987 NATIONAL ACADEMY OF SCIENCES WASHINGTON D.C. WILLIAM GEMMELL COCHRAN July 15, 1909-March29, 1980 BY MORRIS HANSEN AND FREDERICK MOSTELLER ILLIAM GEMMELL COCHRAN was born into modest Wcircumstances on July 15, 1909, in Rutherglen, Scot- land. His father, Thomas, the eldest of seven children, had begun his lifetime employment with the railroad at the age of thirteen. The family, consisting of Thomas, his wife Jean- nie, and sons Oliver and William, moved to Gourock, a hol- iday resort town on the Firth of Clyde, when William was six, and to Glasgow ten years later. Oliver has colorful recollections of their childhood. At age five, Willie (pronounced Wully), as he was known to family and friends, was hospitalized for a burst appendix, and his life hung in the balance for a day. But soon he was home, wearying his family with snatches of German taught him by a German patient in his nursing-home ward. Willie had a knack for hearing or reading something and remembering it. Oliver recalls that throughout his life, Willie would walk or sit around reciting poems, speeches, advertisements, mu- sic hall songs, and in later life oratorios and choral works he was learning. Until Willie was sixteen, the family lived in an apartment known in Scotland as a "two room and kitchen"—a parlor- cum-dining room (used on posh occasions, about twelve times a year), a bedroom used by the parents, and a kitchen. -
The Development of Statistical Design and Analysis Concepts at Rothamsted
The Development of Statistical Design and Analysis Concepts at Rothamsted Roger Payne, VSN International,Hemel Hempstead, Herts, UK (and Rothamsted Research, Harpenden, Herts UK). email: [email protected] DEMA 2011 1/9/11 Rothamsted 100 years ago •significance test (Arbuthnot 1710) •Bayes theorem (Bayes 1763) •least squares (Gauss 1809, Legendre 1805) •central limit theorem (Laplace 1812) • distributions of large-samples tend to become Normal •"Biometric School" Karl Pearson, University College • correlation, chi-square, method of moments •t-test (Gosset 1908) •Fisher in Stats Methods for Research Workers • "..traditional machinery of statistical processes is wholly unsuited to the needs of practical research. Not only does it take a cannon to shoot a sparrow, but it misses the sparrow!" .. Rothamsted • Broadbalk – set up by Sir John Lawes in 1843 to study the effects of inorganic fertilisers on crop yields • had some traces of factorial structure, but no replication, no randomization and no blocking • and not analysed statistically until 1919.. Broadbalk •treatments on the strips 01 (Fym) N4 11 N4 P Mg 21 Fym N3 12 N1+3+1 (P) K2 Mg2 22 Fym 13 N4 P K 03 Nil 14 N4 P K* (Mg*) 05 (P) K Mg 15 N5 (P) K Mg 06 N1 (P) K Mg 16 N6 (P) K Mg 07 N2 (P) K Mg 17 N1+4+1 P K Mg 08 N3 (P) K Mg 18 N1+2+1 P K Mg 09 N4 (P) K Mg 19 N1+1+1 K Mg 10 N4 20 N4 K Mg •hints of factorial structure (e.g. -
Chi-Square Tests
Chi-Square Tests Nathaniel E. Helwig Associate Professor of Psychology and Statistics University of Minnesota October 17, 2020 Copyright c 2020 by Nathaniel E. Helwig Nathaniel E. Helwig (Minnesota) Chi-Square Tests c October 17, 2020 1 / 32 Table of Contents 1. Goodness of Fit 2. Tests of Association (for 2-way Tables) 3. Conditional Association Tests (for 3-way Tables) Nathaniel E. Helwig (Minnesota) Chi-Square Tests c October 17, 2020 2 / 32 Goodness of Fit Table of Contents 1. Goodness of Fit 2. Tests of Association (for 2-way Tables) 3. Conditional Association Tests (for 3-way Tables) Nathaniel E. Helwig (Minnesota) Chi-Square Tests c October 17, 2020 3 / 32 Goodness of Fit A Primer on Categorical Data Analysis In the previous chapter, we looked at inferential methods for a single proportion or for the difference between two proportions. In this chapter, we will extend these ideas to look more generally at contingency table analysis. All of these methods are a form of \categorical data analysis", which involves statistical inference for nominal (or categorial) variables. Nathaniel E. Helwig (Minnesota) Chi-Square Tests c October 17, 2020 4 / 32 Goodness of Fit Categorical Data with J > 2 Levels Suppose that X is a categorical (i.e., nominal) variable that has J possible realizations: X 2 f0;:::;J − 1g. Furthermore, suppose that P (X = j) = πj where πj is the probability that X is equal to j for j = 0;:::;J − 1. PJ−1 J−1 Assume that the probabilities satisfy j=0 πj = 1, so that fπjgj=0 defines a valid probability mass function for the random variable X. -
11 Basic Concepts of Testing for Independence
11-3 Contingency Tables Basic Concepts of Testing for Independence In this section we consider contingency tables (or two-way frequency tables), which include frequency counts for A contingency table (or two-way frequency table) is a table in categorical data arranged in a table with a least two rows and which frequencies correspond to two variables. at least two columns. (One variable is used to categorize rows, and a second We present a method for testing the claim that the row and variable is used to categorize columns.) column variables are independent of each other. Contingency tables have at least two rows and at least two columns. 11 Example Definition Below is a contingency table summarizing the results of foot procedures as a success or failure based different treatments. Test of Independence A test of independence tests the null hypothesis that in a contingency table, the row and column variables are independent. Notation Hypotheses and Test Statistic O represents the observed frequency in a cell of a H0 : The row and column variables are independent. contingency table. H1 : The row and column variables are dependent. E represents the expected frequency in a cell, found by assuming that the row and column variables are ()OE− 2 independent χ 2 = ∑ r represents the number of rows in a contingency table (not E including labels). (row total)(column total) c represents the number of columns in a contingency table E = (not including labels). (grand total) O is the observed frequency in a cell and E is the expected frequency in a cell. -
Good, Irving John Employment Grants Degrees and Honors
CV_July_24_04 Good, Irving John Born in London, England, December 9, 1916 Employment Foreign Office, 1941-45. Worked at Bletchley Park on Ultra (both the Enigma and a Teleprinter encrypting machine the Schlüsselzusatz SZ40 and 42 which we called Tunny or Fish) as the main statistician under A. M. Turing, F. R. S., C. H. O. D. Alexander (British Chess Champion twice), and M. H. A. Newman, F. R. S., in turn. (All three were friendly bosses.) Lecturer in Mathematics and Electronic Computing, Manchester University, 1945-48. Government Communications Headquarters, U.K., 1948-59. Visiting Research Associate Professor, Princeton, 1955 (Summer). Consultant to IBM for a few weeks, 1958/59. (Information retrieval and evaluation of the Perceptron.) Admiralty Research Laboratory, 1959-62. Consultant, Communications Research Division of the Institute for Defense Analysis, 1962-64. Senior Research Fellow, Trinity College, Oxford, and Atlas Computer Laboratory, Science Research Council, Great Britain, 1964-67. Professor (research) of Statistics, Virginia Polytechnic Institute and State University since July, 1967. University Distinguished Professor since Nov. 19, 1969. Emeritus since July 1994. Adjunct Professor of the Center for the Study of Science in Society from October 19, 1983. Adjunct Professor of Philosophy from April 6, 1984. Grants A grant, Probability Estimation, from the U.S. Dept. of Health, Education and Welfare; National Institutes of Health, No. R01 Gm 19770, was funded in 1970 and was continued until July 31, 1989. (Principal Investigator.) Also involved in a grant obtained by G. Tullock and T. N. Tideman on Public Choice, NSF Project SOC 78-06180; 1977-79. Degrees and Honors Rediscovered irrational numbers at the age of 9 (independently discovered by the Pythagoreans and described by the famous G.H.Hardy as of “profound importance”: A Mathematician’s Apology, §14). -
On Median Tests for Linear Hypotheses G
ON MEDIAN TESTS FOR LINEAR HYPOTHESES G. W. BROWN AND A. M. MOOD THE RAND CORPORATION 1. Introduction and summary This paper discusses the rationale of certain median tests developed by us at Iowa State College on a project supported by the Office of Naval Research. The basic idea on which the tests rest was first presented in connection with a problem in [1], and later we extended that idea to consider a variety of analysis of variance situations and linear regression problems in general. Many of the tests were pre- sented in [2] but without much justification. And in fact the tests are not based on any solid foundation at all but merely on what appeared to us to be reasonable com- promises with the various conflicting practical interests involved. Throughout the paper we shall be mainly concerned with analysis of variance hypotheses. The methods could be discussed just as well in terms of the more gen- eral linear regression problem, but the general ideas and difficulties are more easily described in the simpler case. In fact a simple two by two table is sufficient to illus- trate most of the fundamental problems. After presenting a test devised by Friedman for row and column effects in a two way classification, some median tests for the same problem will be described. Then we shall examine a few other situations, in particular the question of testing for interactions, and it is here that some real troubles arise. The asymptotic character of the tests is described in the final section of the paper.