X2 & G Tests for Association in Rxc Contingency Tables
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Contingency Tables Are Eaten by Large Birds of Prey
Case Study Case Study Example 9.3 beginning on page 213 of the text describes an experiment in which fish are placed in a large tank for a period of time and some Contingency Tables are eaten by large birds of prey. The fish are categorized by their level of parasitic infection, either uninfected, lightly infected, or highly infected. It is to the parasites' advantage to be in a fish that is eaten, as this provides Bret Hanlon and Bret Larget an opportunity to infect the bird in the parasites' next stage of life. The observed proportions of fish eaten are quite different among the categories. Department of Statistics University of Wisconsin|Madison Uninfected Lightly Infected Highly Infected Total October 4{6, 2011 Eaten 1 10 37 48 Not eaten 49 35 9 93 Total 50 45 46 141 The proportions of eaten fish are, respectively, 1=50 = 0:02, 10=45 = 0:222, and 37=46 = 0:804. Contingency Tables 1 / 56 Contingency Tables Case Study Infected Fish and Predation 2 / 56 Stacked Bar Graph Graphing Tabled Counts Eaten Not eaten 50 40 A stacked bar graph shows: I the sample sizes in each sample; and I the number of observations of each type within each sample. 30 This plot makes it easy to compare sample sizes among samples and 20 counts within samples, but the comparison of estimates of conditional Frequency probabilities among samples is less clear. 10 0 Uninfected Lightly Infected Highly Infected Contingency Tables Case Study Graphics 3 / 56 Contingency Tables Case Study Graphics 4 / 56 Mosaic Plot Mosaic Plot Eaten Not eaten 1.0 0.8 A mosaic plot replaces absolute frequencies (counts) with relative frequencies within each sample. -
Use of Chi-Square Statistics
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site. Copyright 2008, The Johns Hopkins University and Marie Diener-West. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed. Use of the Chi-Square Statistic Marie Diener-West, PhD Johns Hopkins University Section A Use of the Chi-Square Statistic in a Test of Association Between a Risk Factor and a Disease The Chi-Square ( X2) Statistic Categorical data may be displayed in contingency tables The chi-square statistic compares the observed count in each table cell to the count which would be expected under the assumption of no association between the row and column classifications The chi-square statistic may be used to test the hypothesis of no association between two or more groups, populations, or criteria Observed counts are compared to expected counts 4 Displaying Data in a Contingency Table Criterion 2 Criterion 1 1 2 3 . C Total . 1 n11 n12 n13 n1c r1 2 n21 n22 n23 . n2c r2 . 3 n31 . r nr1 nrc rr Total c1 c2 cc n 5 Chi-Square Test Statistic -
Knowledge and Human Capital As Sustainable Competitive Advantage in Human Resource Management
sustainability Article Knowledge and Human Capital as Sustainable Competitive Advantage in Human Resource Management Miloš Hitka 1 , Alžbeta Kucharˇcíková 2, Peter Štarcho ˇn 3 , Žaneta Balážová 1,*, Michal Lukáˇc 4 and Zdenko Stacho 5 1 Faculty of Wood Sciences and Technology, Technical University in Zvolen, T. G. Masaryka 24, 960 01 Zvolen, Slovakia; [email protected] 2 Faculty of Management Science and Informatics, University of Žilina, Univerzitná 8215/1, 010 26 Žilina, Slovakia; [email protected] 3 Faculty of Management, Comenius University in Bratislava, Odbojárov 10, P.O. BOX 95, 82005 Bratislava, Slovakia; [email protected] 4 Faculty of Social Sciences, University of SS. Cyril and Methodius in Trnava, Buˇcianska4/A, 917 01 Trnava, Slovakia; [email protected] 5 Institut of Civil Society, University of SS. Cyril and Methodius in Trnava, Buˇcianska4/A, 917 01 Trnava, Slovakia; [email protected] * Correspondence: [email protected]; Tel.: +421-45-520-6189 Received: 2 August 2019; Accepted: 9 September 2019; Published: 12 September 2019 Abstract: The ability to do business successfully and to stay on the market is a unique feature of each company ensured by highly engaged and high-quality employees. Therefore, innovative leaders able to manage, motivate, and encourage other employees can be a great competitive advantage of an enterprise. Knowledge of important personality factors regarding leadership, incentives and stimulus, systematic assessment, and subsequent motivation factors are parts of human capital and essential conditions for effective development of its potential. Familiarity with various ways to motivate leaders and their implementation in practice are important for improving the work performance and reaching business goals. -
Statistical Characterization of Tissue Images for Detec- Tion and Classification of Cervical Precancers
Statistical characterization of tissue images for detec- tion and classification of cervical precancers Jaidip Jagtap1, Nishigandha Patil2, Chayanika Kala3, Kiran Pandey3, Asha Agarwal3 and Asima Pradhan1, 2* 1Department of Physics, IIT Kanpur, U.P 208016 2Centre for Laser Technology, IIT Kanpur, U.P 208016 3G.S.V.M. Medical College, Kanpur, U.P. 208016 * Corresponding author: [email protected], Phone: +91 512 259 7971, Fax: +91-512-259 0914. Abstract Microscopic images from the biopsy samples of cervical cancer, the current “gold standard” for histopathology analysis, are found to be segregated into differing classes in their correlation properties. Correlation domains clearly indicate increasing cellular clustering in different grades of pre-cancer as compared to their normal counterparts. This trend manifests in the probabilities of pixel value distribution of the corresponding tissue images. Gradual changes in epithelium cell density are reflected well through the physically realizable extinction coefficients. Robust statistical parameters in the form of moments, characterizing these distributions are shown to unambiguously distinguish tissue types. These parameters can effectively improve the diagnosis and classify quantitatively normal and the precancerous tissue sections with a very high degree of sensitivity and specificity. Key words: Cervical cancer; dysplasia, skewness; kurtosis; entropy, extinction coefficient,. 1. Introduction Cancer is a leading cause of death worldwide, with cervical cancer being the fifth most common cancer in women [1-2]. It originates as a few abnormal cells in the initial stage and then spreads rapidly. Treatment of cancer is often ineffective in the later stages, which makes early detection the key to survival. Pre-cancerous cells can sometimes take 10-15 years to develop into cancer and regular tests such as pap-smear are recommended. -
Pearson-Fisher Chi-Square Statistic Revisited
Information 2011 , 2, 528-545; doi:10.3390/info2030528 OPEN ACCESS information ISSN 2078-2489 www.mdpi.com/journal/information Communication Pearson-Fisher Chi-Square Statistic Revisited Sorana D. Bolboac ă 1, Lorentz Jäntschi 2,*, Adriana F. Sestra ş 2,3 , Radu E. Sestra ş 2 and Doru C. Pamfil 2 1 “Iuliu Ha ţieganu” University of Medicine and Pharmacy Cluj-Napoca, 6 Louis Pasteur, Cluj-Napoca 400349, Romania; E-Mail: [email protected] 2 University of Agricultural Sciences and Veterinary Medicine Cluj-Napoca, 3-5 M ănăş tur, Cluj-Napoca 400372, Romania; E-Mails: [email protected] (A.F.S.); [email protected] (R.E.S.); [email protected] (D.C.P.) 3 Fruit Research Station, 3-5 Horticultorilor, Cluj-Napoca 400454, Romania * Author to whom correspondence should be addressed; E-Mail: [email protected]; Tel: +4-0264-401-775; Fax: +4-0264-401-768. Received: 22 July 2011; in revised form: 20 August 2011 / Accepted: 8 September 2011 / Published: 15 September 2011 Abstract: The Chi-Square test (χ2 test) is a family of tests based on a series of assumptions and is frequently used in the statistical analysis of experimental data. The aim of our paper was to present solutions to common problems when applying the Chi-square tests for testing goodness-of-fit, homogeneity and independence. The main characteristics of these three tests are presented along with various problems related to their application. The main problems identified in the application of the goodness-of-fit test were as follows: defining the frequency classes, calculating the X2 statistic, and applying the χ2 test. -
The Instat Guide to Choosing and Interpreting Statistical Tests
Statistics for biologists The InStat Guide to Choosing and Interpreting Statistical Tests GraphPad InStat Version 3 for Macintosh By GraphPad Software, Inc. © 1990-2001,,, GraphPad Soffftttware,,, IIInc... Allllll riiighttts reserved... Program design, manual and help Dr. Harvey J. Motulsky screens: Paige Searle Programming: Mike Platt John Pilkington Harvey Motulsky Maciiintttosh conversiiion by Soffftttware MacKiiiev... www...mackiiiev...com Project Manager: Dennis Radushev Programmers: Alexander Bezkorovainy Dmitry Farina Quality Assurance: Lena Filimonihina Help and Manual: Pavel Noga Andrew Yeremenko InStat and GraphPad Prism are registered trademarks of GraphPad Software, Inc. All Rights Reserved. Manufactured in the United States of America. Except as permitted under the United States copyright law of 1976, no part of this publication may be reproduced or distributed in any form or by any means without the prior written permission of the publisher. Use of the software is subject to the restrictions contained in the accompanying software license agreement. How ttto reach GraphPad: Phone: (US) 858-457-3909 Fax: (US) 858-457-8141 Email: [email protected] or [email protected] Web: www.graphpad.com Mail: GraphPad Software, Inc. 5755 Oberlin Drive #110 San Diego, CA 92121 USA The entire text of this manual is available on-line at www.graphpad.com Contents Welcome to InStat........................................................................................7 The InStat approach ---------------------------------------------------------------7 -
Measures of Association for Contingency Tables
Newsom Psy 525/625 Categorical Data Analysis, Spring 2021 1 Measures of Association for Contingency Tables The Pearson chi-squared statistic and related significance tests provide only part of the story of contingency table results. Much more can be gleaned from contingency tables than just whether the results are different from what would be expected due to chance (Kline, 2013). For many data sets, the sample size will be large enough that even small departures from expected frequencies will be significant. And, for other data sets, we may have low power to detect significance. We therefore need to know more about the strength of the magnitude of the difference between the groups or the strength of the relationship between the two variables. Phi The most common measure of magnitude of effect for two binary variables is the phi coefficient. Phi can take on values between -1.0 and 1.0, with 0.0 representing complete independence and -1.0 or 1.0 representing a perfect association. In probability distribution terms, the joint probabilities for the cells will be equal to the product of their respective marginal probabilities, Pn( ij ) = Pn( i++) Pn( j ) , only if the two variables are independent. The formula for phi is often given in terms of a shortcut notation for the frequencies in the four cells, called the fourfold table. Azen and Walker Notation Fourfold table notation n11 n12 A B n21 n22 C D The equation for computing phi is a fairly simple function of the cell frequencies, with a cross- 1 multiplication and subtraction of the two sets of diagonal cells in the numerator. -
Chi-Square Tests
Chi-Square Tests Nathaniel E. Helwig Associate Professor of Psychology and Statistics University of Minnesota October 17, 2020 Copyright c 2020 by Nathaniel E. Helwig Nathaniel E. Helwig (Minnesota) Chi-Square Tests c October 17, 2020 1 / 32 Table of Contents 1. Goodness of Fit 2. Tests of Association (for 2-way Tables) 3. Conditional Association Tests (for 3-way Tables) Nathaniel E. Helwig (Minnesota) Chi-Square Tests c October 17, 2020 2 / 32 Goodness of Fit Table of Contents 1. Goodness of Fit 2. Tests of Association (for 2-way Tables) 3. Conditional Association Tests (for 3-way Tables) Nathaniel E. Helwig (Minnesota) Chi-Square Tests c October 17, 2020 3 / 32 Goodness of Fit A Primer on Categorical Data Analysis In the previous chapter, we looked at inferential methods for a single proportion or for the difference between two proportions. In this chapter, we will extend these ideas to look more generally at contingency table analysis. All of these methods are a form of \categorical data analysis", which involves statistical inference for nominal (or categorial) variables. Nathaniel E. Helwig (Minnesota) Chi-Square Tests c October 17, 2020 4 / 32 Goodness of Fit Categorical Data with J > 2 Levels Suppose that X is a categorical (i.e., nominal) variable that has J possible realizations: X 2 f0;:::;J − 1g. Furthermore, suppose that P (X = j) = πj where πj is the probability that X is equal to j for j = 0;:::;J − 1. PJ−1 J−1 Assume that the probabilities satisfy j=0 πj = 1, so that fπjgj=0 defines a valid probability mass function for the random variable X. -
Estimating Suitable Probability Distribution Function for Multimodal Traffic Distribution Function
Journal of the Korean Society of Marine Environment & Safety Research Paper Vol. 21, No. 3, pp. 253-258, June 30, 2015, ISSN 1229-3431(Print) / ISSN 2287-3341(Online) http://dx.doi.org/10.7837/kosomes.2015.21.3.253 Estimating Suitable Probability Distribution Function for Multimodal Traffic Distribution Function Sang-Lok Yoo* ․ Jae-Yong Jeong** ․ Jeong-Bin Yim** * Graduate school of Mokpo National Maritime University, Mokpo 530-729, Korea ** Professor, Mokpo National Maritime University, Mokpo 530-729, Korea Abstract : The purpose of this study is to find suitable probability distribution function of complex distribution data like multimodal. Normal distribution is broadly used to assume probability distribution function. However, complex distribution data like multimodal are very hard to be estimated by using normal distribution function only, and there might be errors when other distribution functions including normal distribution function are used. In this study, we experimented to find fit probability distribution function in multimodal area, by using AIS(Automatic Identification System) observation data gathered in Mokpo port for a year of 2013. By using chi-squared statistic, gaussian mixture model(GMM) is the fittest model rather than other distribution functions, such as extreme value, generalized extreme value, logistic, and normal distribution. GMM was found to the fit model regard to multimodal data of maritime traffic flow distribution. Probability density function for collision probability and traffic flow distribution will be calculated much precisely in the future. Key Words : Probability distribution function, Multimodal, Gaussian mixture model, Normal distribution, Maritime traffic flow 1. Introduction* traffic time and speed. Some studies estimated the collision probability when the ship Maritime traffic flow is affected by the volume of traffic, tidal is in confronting or passing by applying it to normal distribution current, wave height, and so on. -
11 Basic Concepts of Testing for Independence
11-3 Contingency Tables Basic Concepts of Testing for Independence In this section we consider contingency tables (or two-way frequency tables), which include frequency counts for A contingency table (or two-way frequency table) is a table in categorical data arranged in a table with a least two rows and which frequencies correspond to two variables. at least two columns. (One variable is used to categorize rows, and a second We present a method for testing the claim that the row and variable is used to categorize columns.) column variables are independent of each other. Contingency tables have at least two rows and at least two columns. 11 Example Definition Below is a contingency table summarizing the results of foot procedures as a success or failure based different treatments. Test of Independence A test of independence tests the null hypothesis that in a contingency table, the row and column variables are independent. Notation Hypotheses and Test Statistic O represents the observed frequency in a cell of a H0 : The row and column variables are independent. contingency table. H1 : The row and column variables are dependent. E represents the expected frequency in a cell, found by assuming that the row and column variables are ()OE− 2 independent χ 2 = ∑ r represents the number of rows in a contingency table (not E including labels). (row total)(column total) c represents the number of columns in a contingency table E = (not including labels). (grand total) O is the observed frequency in a cell and E is the expected frequency in a cell. -
Continuous Dependent Variable Models
Chapter 4 Continuous Dependent Variable Models CHAPTER 4; SECTION A: ANALYSIS OF VARIANCE Purpose of Analysis of Variance: Analysis of Variance is used to analyze the effects of one or more independent variables (factors) on the dependent variable. The dependent variable must be quantitative (continuous). The dependent variable(s) may be either quantitative or qualitative. Unlike regression analysis no assumptions are made about the relation between the independent variable and the dependent variable(s). The theory behind ANOVA is that a change in the magnitude (factor level) of one or more of the independent variables or combination of independent variables (interactions) will influence the magnitude of the response, or dependent variable, and is indicative of differences in parent populations from which the samples were drawn. Analysis is Variance is the basic analytical procedure used in the broad field of experimental designs, and can be used to test the difference in population means under a wide variety of experimental settings—ranging from fairly simple to extremely complex experiments. Thus, it is important to understand that the selection of an appropriate experimental design is the first step in an Analysis of Variance. The following section discusses some of the fundamental differences in basic experimental designs—with the intent merely to introduce the reader to some of the basic considerations and concepts involved with experimental designs. The references section points to some more detailed texts and references on the subject, and should be consulted for detailed treatment on both basic and advanced experimental designs. Examples: An analyst or engineer might be interested to assess the effect of: 1. -
Fitting Population Models to Multiple Sources of Observed Data Author(S): Gary C
Fitting Population Models to Multiple Sources of Observed Data Author(s): Gary C. White and Bruce C. Lubow Reviewed work(s): Source: The Journal of Wildlife Management, Vol. 66, No. 2 (Apr., 2002), pp. 300-309 Published by: Allen Press Stable URL: http://www.jstor.org/stable/3803162 . Accessed: 07/05/2012 19:10 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Allen Press is collaborating with JSTOR to digitize, preserve and extend access to The Journal of Wildlife Management. http://www.jstor.org FITTINGPOPULATION MODELS TO MULTIPLESOURCES OF OBSERVEDDATA GARYC. WHITE,'Department of Fisheryand WildlifeBiology, Colorado State University,Fort Collins, CO 80523, USA BRUCEC. LUBOW,Colorado Cooperative Fish and WildlifeUnit, Colorado State University,Fort Collins, CO 80523, USA Abstract:The use of populationmodels based on severalsources of data to set harvestlevels is a standardproce- dure most westernstates use for managementof mule deer (Odocoileushemionus), elk (Cervuselaphus), and other game populations.We present a model-fittingprocedure to estimatemodel parametersfrom multiplesources of observeddata using weighted least squares and model selectionbased on Akaike'sInformation Criterion. The pro- cedure is relativelysimple to implementwith modernspreadsheet software. We illustratesuch an implementation using an examplemule deer population.Typical data requiredinclude age and sex ratios,antlered and antlerless harvest,and populationsize.