1 Analysis of Categorical Data

Total Page:16

File Type:pdf, Size:1020Kb

1 Analysis of Categorical Data 1 Analysis of Categorical Data The techniques presented in previous sections cover the analysis of numerical data and vari- ables. Up to now it is left unanswered how to analyze categorical variables and data. 1.1 Two-way tables How to present data for categorical variables? Remember when doing a data description for a categorical variable we choose to do this with a relative frequency table. But when we have data on two categorical variables and we want to illustrate how the two variables depend on each other we use two-way tables. Two Way Tables: Data resulting from observations made on two categorical variables can be easily summarized in a two way table. Example: Suppose we are interested in the rate of sprouted seeds in two di®erent kinds of water (rain- water, muddy water). The two categorical variables are Sprouted (yes or no) and water type (rainwater and muddy water). Suppose rainwater, muddy water, and tap water were used to water 100 seeds each. Then they were checked and noted how many of those seeds sprouted. The result can then be easily presented in the following table: sprouted yes no total rainwater 64 36 100 muddy water 74 26 100 tap water 60 40 100 Total 138 62 200 A natural question to ask at this point would be if the choice of water has an e®ect on the probability for a seed to sprout. Multiple Comparison The question in the example could be answered by comparing each type of water with each of the remaining, i.e. one would have to make the following comparisons muddy { rain muddy { tap rain { tap. But it would be better to ¯rst ¯nd if there is a di®erence at all between the probabilities. To ¯nd out we will introduce the Â2-test for homogeneity. 1.2 Â2 Test of Homogeneity and Independence in a Two-Way-Table In this section two di®erent situations and questions concerning two categorical variables will be covered. Both will lead to the same test. So, let us ¯rst introduce the two di®erent types of questions and then introduce the test routine to answer these questions. 1 1. Comparing the distribution of a categorical variable in two or more populations. We will be looking at two or more samples from di®erent populations and test if the distribu- tions in all populations are the same. The null hypothesis to be tested in this kind of problem is that the distributions are homoge- neous, they are equal for all populations. H0 : The probabilities for sprouting and nonsprouting is the same for both methods Ha : The probabilities for sprouting and nonsprouting depend on the treatment 2. Testing two categorical variables for independence When you study data that involves two variables, one important consideration is the relation- ship between the two variables. Does the proportion of measurements in the various categories for factor 1 depend on which category of factor 2 is being observed? Example: A survey was conducted to evaluate the e®ectiveness of a new flu vaccine that had been administered in a small community. It consists of a two{shot sequence in two weeks. A survey of 1000 residents the following spring provided the following information: No vaccine One Shot Two Shots Total Flu 24 9 13 46 No Flu 289 100 565 954 Total 313 109 578 1000 The question to be answered is, if the flu shot had an impact on the incidence of the flu. Or we could ask if the incidence of the flu is independent from the vaccine. In order to answer this question, a Â2 test can be conducted, testing the following hypotheses: H0 : No relationship between treatment and incidence of flu Ha : Incidence of flu depends on amount of flu treatment The two questions lead to the exact same Â2 test. The Â2 distribution The Â2{ distribution is neither a normal nor a t-distribution. Table D gives upper tail areas for di®erent degrees of freedom. 2 The Â2 Test for Homogeneity and Independence Given are a categorical variable with R categories and one categorical variable with C cate- gories. Hypotheses: H0: The two categorical variables are independent Ha : H0 is not true. Assumption: The sample size is large. The sample size is considered large enough as long as every count is at least 1, and not more than 20% of counts are less than 5. Test statistic: Compute for every cell of the two{way table the expected frequency (row total)(column total) E = expectedfrequency = Sample size and then X 2 2 (O ¡ E) Â0 = all cells E where O is the observed count. 2 2 2 P-value: P ( > Â0) the upper tail area for a  distribution with (R ¡ 1)(C ¡ 1) df, found in Appendix Table D. Decision: As usual. Context: As usual. Continue Flu Example: 3 1. Hypotheses H0: The flu is independent of the vaccine status versus Ha : H0 is not true. Let's perform this test at a signi¯cance level of ® = 0:05. 2. Assumptions are easily met the sample sizes are all greater than 5. 3. Test Statistic The two-way table gives the observed frequencies O. Next calculate the expected cell counts E for each cell: Flu/no vaccine: (46¢313)/1000=14.40 Flu/one shot: (46¢109)/1000=5.01 Flu/two shots: (46¢578)/1000=26.59 No Flu/no vaccine: (954¢313)/1000=298.60 No Flu/one shot: (954¢109)/1000=103.99 No Flu/two shots: (954¢578)/1000=551.41 Put the expected cell counts into the table: No vaccine One Shot Two Shots Total Flu 24 9 13 46 14.40 5.01 26.59 No Flu 289 100 565 954 298.60 103.99 551.41 Total 313 109 578 1000 Now ¯nd for each cell (O ¡ E)2=E and add these fractions: Â2 = 6:404 + 3:169 + 6:944 + 0:309 + 0:153 + 0:335 = 17:313 and df=(3-1)(2-1)=2 4. P-value Table D provides us for df = 2 with P-value<0.0005. 5. Decision Since the P-value is less than ® reject the null hypothesis. Context: At signi¯cance level of 5% the data do provide su±cient evidence that the probability of getting the flu is not the same for all three vaccination groups. The nature of the relationship has still to be explored: For example estimate the following probabilities P (flu=yesjvaccine=0) estimate 24/313=0.0767 P (flu=yesjvaccine=1) estimate 9/109=0.0826 P (flu=yesjvaccine=2) estimate 13/578=0.0225 From this we ¯nd that the estimated probability to get the flu given a person had 2 shots is much less than the estimated probabilities to get the flu given that a person had no or only one shot. 4 Example: Some time ago there was a report in the news that an AIDS vaccine tested in Thailand didn't show any e®ect. The data quoted in the news is presented in the two{way table below (including the expected cell counts): Placebo Vaccine Total HIV+ 105 106 211 105.5 105.5 HIV- 1168 1167 2335 1167.5 1167.5 Total 1273 1273 2546 Â2 = 0:00237 + 0:00237 + 0:000214 + 0:000214 = 0:005168 and df=(2-1)(2-1)=1 Conduct a test of homogeneity : 1. Hypotheses: H0 : The probability to get AIDS, is the same for the vaccinated and the placebo group versus Ha: H0 is not true. ® = 0:05 2. The sample sizes are all large enough. 3. Test Statistic: See calculations above: Â2 = 0:0052, with 1 df, 4. P-value From Table D we ¯nd P-value>0.25 5. Decision: Do not reject H0 since the p-value is greater than ®. 6. Context: At signi¯cance level ® = 0:05 the data do not provide su±cient evidence that the HIV infection{rate was impacted by the vaccine. 1.3 Â2 Goodness of ¯t Test In this section the Â2 test for comparing the relative frequency distribution from a sample with a given probability distribution is introduced. Example: A company ¯lling grass seed bags wants to evaluate their ¯lling machine. The following distribution is advertised on their bags, where K1-K5 are di®erent kinds of grass seeds: kind of seeds proportion K1 0.5 K2 0.25 K3 0.15 K4 0.05 K5 0.05 5 The company wants to check if the seed distribution in the bags ¯ts the advertised distribution. They take a sample of size 1000 and ¯nd the following summarized data: kind of seeds count K1 480 K2 233 K3 160 K4 63 K5 64 In order to check if the label is a truthful description of the contents of the seed bags, we want to compare the claimed distribution with the sample data. Notation: for a given categorical random variable k = number of categories p1 = true probability to fall in category 1 p2 = true probability to fall in category 2 . pk = true probability to fall in category k In order to compare the observed frequencies with the hypothesized distribution we study the Â2 goodness{of{¯t statistic. O1 = observed cell count for category 1 O2 = observed cell count for category 2 . Ok = observed cell count for category k These observed counts shall be compared with the expected count, if the claim is true (the label is true). If we examine an event that occurs with probability p, then in a sample of size n we would expect to see this event about n ¢ p times.
Recommended publications
  • Hypothesis Testing with Two Categorical Variables 203
    Chapter 10 distribute or Hypothesis Testing With Two Categoricalpost, Variables Chi-Square copy, Learning Objectives • Identify the correct types of variables for use with a chi-square test of independence. • Explainnot the difference between parametric and nonparametric statistics. • Conduct a five-step hypothesis test for a contingency table of any size. • Explain what statistical significance means and how it differs from practical significance. • Identify the correct measure of association for use with a particular chi-square test, Doand interpret those measures. • Use SPSS to produce crosstabs tables, chi-square tests, and measures of association. 202 Part 3 Hypothesis Testing Copyright ©2016 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher. he chi-square test of independence is used when the independent variable (IV) and dependent variable (DV) are both categorical (nominal or ordinal). The chi-square test is member of the family of nonparametric statistics, which are statistical Tanalyses used when sampling distributions cannot be assumed to be normally distributed, which is often the result of the DV being categorical rather than continuous (we will talk in detail about this). Chi-square thus sits in contrast to parametric statistics, which are used when DVs are continuous and sampling distributions are safely assumed to be normal. The t test, analysis of variance, and correlation are all parametric. Before going into the theory and math behind the chi-square statistic, read the Research Examples for illustrations of the types of situations in which a criminal justice or criminology researcher would utilize a chi- square test.
    [Show full text]
  • Regression Summary Project Analysis for Today Review Questions: Categorical Variables
    Statistics 102 Regression Summary Spring, 2000 - 1 - Regression Summary Project Analysis for Today First multiple regression • Interpreting the location and wiring coefficient estimates • Interpreting interaction terms • Measuring significance Second multiple regression • Deciding how to extend a model • Diagnostics (leverage plots, residual plots) Review Questions: Categorical Variables Where do those terms in a categorical regression come from? • You cannot use a categorical term directly in regression (e.g. 2(“Yes”)=?). • JMP converts each categorical variable into a collection of numerical variables that represent the information in the categorical variable, but are numerical and so can be used in regression. • These special variables (a.k.a., dummy variables) use only the numbers +1, 0, and –1. • A categorical variable with k categories requires (k-1) of these special numerical variables. Thus, adding a categorical variable with, for example, 5 categories adds 4 of these numerical variables to the model. How do I use the various tests in regression? What question are you trying to answer… • Does this predictor add significantly to my model, improving the fit beyond that obtained with the other predictors? (t-ratio, CI, p-value) • Does this collection of predictors add significantly to my model? Partial-F (Note: the only time you need partial F is when working with categorical variables that define 3 or more categories. In these cases, JMP shows you the partial-F as an “Effect Test”.) • Does my full model explain “more than random variation”? Use the F-ratio from the Anova summary table. Statistics 102 Regression Summary Spring, 2000 - 2 - How do I interpret JMP output with categorical variables? Term Estimate Std Error t Ratio Prob>|t| Intercept 179.59 5.62 32.0 0.00 Run Size 0.23 0.02 9.5 0.00 Manager[a-c] 22.94 7.76 3.0 0.00 Manager[b-c] 6.90 8.73 0.8 0.43 Manager[a-c]*Run Size 0.07 0.04 2.1 0.04 Manager[b-c]*Run Size -0.10 0.04 -2.6 0.01 • Brackets denote the JMP’s version of dummy variables.
    [Show full text]
  • Esomar/Grbn Guideline for Online Sample Quality
    ESOMAR/GRBN GUIDELINE FOR ONLINE SAMPLE QUALITY ESOMAR GRBN ONLINE SAMPLE QUALITY GUIDELINE ESOMAR, the World Association for Social, Opinion and Market Research, is the essential organisation for encouraging, advancing and elevating market research: www.esomar.org. GRBN, the Global Research Business Network, connects 38 research associations and over 3500 research businesses on five continents: www.grbn.org. © 2015 ESOMAR and GRBN. Issued February 2015. This Guideline is drafted in English and the English text is the definitive version. The text may be copied, distributed and transmitted under the condition that appropriate attribution is made and the following notice is included “© 2015 ESOMAR and GRBN”. 2 ESOMAR GRBN ONLINE SAMPLE QUALITY GUIDELINE CONTENTS 1 INTRODUCTION AND SCOPE ................................................................................................... 4 2 DEFINITIONS .............................................................................................................................. 4 3 KEY REQUIREMENTS ................................................................................................................ 6 3.1 The claimed identity of each research participant should be validated. .................................................. 6 3.2 Providers must ensure that no research participant completes the same survey more than once ......... 8 3.3 Research participant engagement should be measured and reported on ............................................... 9 3.4 The identity and personal
    [Show full text]
  • Lecture 1: Why Do We Use Statistics, Populations, Samples, Variables, Why Do We Use Statistics?
    1pops_samples.pdf Michael Hallstone, Ph.D. [email protected] Lecture 1: Why do we use statistics, populations, samples, variables, why do we use statistics? • interested in understanding the social world • we want to study a portion of it and say something about it • ex: drug users, homeless, voters, UH students Populations and Samples Populations, Sampling Elements, Frames, and Units A researcher defines a group, “list,” or pool of cases that she wishes to study. This is a population. Another definition: population = complete collection of measurements, objects or individuals under study. 1 of 11 sample = a portion or subset taken from population funny circle diagram so we take a sample and infer to population Why? feasibility – all MD’s in world , cost, time, and stay tuned for the central limits theorem...the most important lecture of this course. Visualizing Samples (taken from) Populations Population Group you wish to study (Mostly made up of “people” in the Then we infer from sample back social sciences) to population (ALWAYS SOME ERROR! “sampling error” Sample (a portion or subset of the population) 4 This population is made up of the things she wishes to actually study called sampling elements. Sampling elements can be people, organizations, schools, whales, molecules, and articles in the popular press, etc. The sampling element is your exact unit of analysis. For crime researchers studying car thieves, the sampling element would probably be individual car thieves – or theft incidents reported to the police. For drug researchers the sampling elements would be most likely be individual drug users. Inferential statistics is truly the basis of much of our scientific evidence.
    [Show full text]
  • Assessment of Socio-Demographic Sample Composition in ESS Round 61
    Assessment of socio-demographic sample composition in ESS Round 61 Achim Koch GESIS – Leibniz Institute for the Social Sciences, Mannheim/Germany, June 2016 Contents 1. Introduction 2 2. Assessing socio-demographic sample composition with external benchmark data 3 3. The European Union Labour Force Survey 3 4. Data and variables 6 5. Description of ESS-LFS differences 8 6. A summary measure of ESS-LFS differences 17 7. Comparison of results for ESS 6 with results for ESS 5 19 8. Correlates of ESS-LFS differences 23 9. Summary and conclusions 27 References 1 The CST of the ESS requests that the following citation for this document should be used: Koch, A. (2016). Assessment of socio-demographic sample composition in ESS Round 6. Mannheim: European Social Survey, GESIS. 1. Introduction The European Social Survey (ESS) is an academically driven cross-national survey that has been conducted every two years across Europe since 2002. The ESS aims to produce high- quality data on social structure, attitudes, values and behaviour patterns in Europe. Much emphasis is placed on the standardisation of survey methods and procedures across countries and over time. Each country implementing the ESS has to follow detailed requirements that are laid down in the “Specifications for participating countries”. These standards cover the whole survey life cycle. They refer to sampling, questionnaire translation, data collection and data preparation and delivery. As regards sampling, for instance, the ESS requires that only strict probability samples should be used; quota sampling and substitution are not allowed. Each country is required to achieve an effective sample size of 1,500 completed interviews, taking into account potential design effects due to the clustering of the sample and/or the variation in inclusion probabilities.
    [Show full text]
  • Using Survey Data Author: Jen Buckley and Sarah King-Hele Updated: August 2015 Version: 1
    ukdataservice.ac.uk Using survey data Author: Jen Buckley and Sarah King-Hele Updated: August 2015 Version: 1 Acknowledgement/Citation These pages are based on the following workbook, funded by the Economic and Social Research Council (ESRC). Williamson, Lee, Mark Brown, Jo Wathan, Vanessa Higgins (2013) Secondary Analysis for Social Scientists; Analysing the fear of crime using the British Crime Survey. Updated version by Sarah King-Hele. Centre for Census and Survey Research We are happy for our materials to be used and copied but request that users should: • link to our original materials instead of re-mounting our materials on your website • cite this as an original source as follows: Buckley, Jen and Sarah King-Hele (2015). Using survey data. UK Data Service, University of Essex and University of Manchester. UK Data Service – Using survey data Contents 1. Introduction 3 2. Before you start 4 2.1. Research topic and questions 4 2.2. Survey data and secondary analysis 5 2.3. Concepts and measurement 6 2.4. Change over time 8 2.5. Worksheets 9 3. Find data 10 3.1. Survey microdata 10 3.2. UK Data Service 12 3.3. Other ways to find data 14 3.4. Evaluating data 15 3.5. Tables and reports 17 3.6. Worksheets 18 4. Get started with survey data 19 4.1. Registration and access conditions 19 4.2. Download 20 4.3. Statistics packages 21 4.4. Survey weights 22 4.5. Worksheets 24 5. Data analysis 25 5.1. Types of variables 25 5.2. Variable distributions 27 5.3.
    [Show full text]
  • Summary of Human Subjects Protection Issues Related to Large Sample Surveys
    Summary of Human Subjects Protection Issues Related to Large Sample Surveys U.S. Department of Justice Bureau of Justice Statistics Joan E. Sieber June 2001, NCJ 187692 U.S. Department of Justice Office of Justice Programs John Ashcroft Attorney General Bureau of Justice Statistics Lawrence A. Greenfeld Acting Director Report of work performed under a BJS purchase order to Joan E. Sieber, Department of Psychology, California State University at Hayward, Hayward, California 94542, (510) 538-5424, e-mail [email protected]. The author acknowledges the assistance of Caroline Wolf Harlow, BJS Statistician and project monitor. Ellen Goldberg edited the document. Contents of this report do not necessarily reflect the views or policies of the Bureau of Justice Statistics or the Department of Justice. This report and others from the Bureau of Justice Statistics are available through the Internet — http://www.ojp.usdoj.gov/bjs Table of Contents 1. Introduction 2 Limitations of the Common Rule with respect to survey research 2 2. Risks and benefits of participation in sample surveys 5 Standard risk issues, researcher responses, and IRB requirements 5 Long-term consequences 6 Background issues 6 3. Procedures to protect privacy and maintain confidentiality 9 Standard issues and problems 9 Confidentiality assurances and their consequences 21 Emerging issues of privacy and confidentiality 22 4. Other procedures for minimizing risks and promoting benefits 23 Identifying and minimizing risks 23 Identifying and maximizing possible benefits 26 5. Procedures for responding to requests for help or assistance 28 Standard procedures 28 Background considerations 28 A specific recommendation: An experiment within the survey 32 6.
    [Show full text]
  • Analysis of Variance with Categorical and Continuous Factors: Beware the Landmines R. C. Gardner Department of Psychology Someti
    Analysis of Variance with Categorical and Continuous Factors: Beware the Landmines R. C. Gardner Department of Psychology Sometimes researchers want to perform an analysis of variance where one or more of the factors is a continuous variable and the others are categorical, and they are advised to use multiple regression to perform the task. The intent of this article is to outline the various ways in which this is normally done, to highlight the decisions with which the researcher is faced, and to warn that the various decisions have distinct implications when it comes to interpretation. This first point to emphasize is that when performing this type of analysis, there are no means to be discussed. Instead, the statistics of interest are intercepts and slopes. More on this later. Models. To begin, there are a number of approaches that one can follow, and each of them refers to a different model. Of primary importance, each model tests a somewhat different hypothesis, and the researcher should be aware of precisely which hypothesis is being tested. The three most common models are: Model I. This is the unique sums of squares approach where the effects of each predictor is assessed in terms of what it adds to the other predictors. It is sometimes referred to as the regression approach, and is identified as SSTYPE3 in GLM. Where a categorical factor consists of more than two levels (i.e., more than one coded vector to define the factor), it would be assessed in terms of the F-ratio for change when those levels are added to all other effects in the model.
    [Show full text]
  • Chapter 3: Simple Random Sampling and Systematic Sampling
    Chapter 3: Simple Random Sampling and Systematic Sampling Simple random sampling and systematic sampling provide the foundation for almost all of the more complex sampling designs that are based on probability sampling. They are also usually the easiest designs to implement. These two designs highlight a trade-off inherent in all sampling designs: do we select sample units at random to minimize the risk of introducing biases into the sample or do we select sample units systematically to ensure that sample units are well- distributed throughout the population? Both designs involve selecting n sample units from the N units in the population and can be implemented with or without replacement. Simple Random Sampling When the population of interest is relatively homogeneous then simple random sampling works well, which means it provides estimates that are unbiased and have high precision. When little is known about a population in advance, such as in a pilot study, simple random sampling is a common design choice. Advantages: • Easy to implement • Requires little advance knowledge about the target population Disadvantages: • Imprecise relative to other designs if the population is heterogeneous • More expensive to implement than other designs if entities are clumped and the cost to travel among units is appreciable How it is implemented: • Select n sample units at random from N available in the population All units within the population must have the same probability of being selected, therefore each and every sample of size n drawn from the population has an equal chance of being selected. There are many strategies available for selecting a random sample.
    [Show full text]
  • Indicators for Support for Economic Integration in Latin America
    Pepperdine Policy Review Volume 11 Article 9 5-10-2019 Indicators for Support for Economic Integration in Latin America Will Humphrey Pepperdine University, School of Public Policy, [email protected] Follow this and additional works at: https://digitalcommons.pepperdine.edu/ppr Recommended Citation Humphrey, Will (2019) "Indicators for Support for Economic Integration in Latin America," Pepperdine Policy Review: Vol. 11 , Article 9. Available at: https://digitalcommons.pepperdine.edu/ppr/vol11/iss1/9 This Article is brought to you for free and open access by the School of Public Policy at Pepperdine Digital Commons. It has been accepted for inclusion in Pepperdine Policy Review by an authorized editor of Pepperdine Digital Commons. For more information, please contact [email protected], [email protected], [email protected]. Indicators for Support for Economic Integration in Latin America By: Will Humphrey Abstract Regionalism is a common phenomenon among many countries who share similar infrastructures, economic climates, and developmental challenges. Latin America is no different and has experienced the urge to economically integrate since World War II. Research literature suggests that public opinion for economic integration can be a motivating factor in a country’s proclivity to integrate with others in its geographic region. People may support integration based on their perception of other countries’ models or based on how much they feel their voice has political value. They may also fear it because they do not trust outsiders and the mixing of societies that regionalism often entails. Using an ordered probit model and data from the 2018 Latinobarómetro public opinion survey, I find that the desire for a more alike society, opinion on the European Union, and the nature of democracy explain public support for economic integration.
    [Show full text]
  • A Computationally Efficient Method for Selecting a Split Questionnaire Design
    Iowa State University Capstones, Theses and Creative Components Dissertations Spring 2019 A Computationally Efficient Method for Selecting a Split Questionnaire Design Matthew Stuart Follow this and additional works at: https://lib.dr.iastate.edu/creativecomponents Part of the Social Statistics Commons Recommended Citation Stuart, Matthew, "A Computationally Efficient Method for Selecting a Split Questionnaire Design" (2019). Creative Components. 252. https://lib.dr.iastate.edu/creativecomponents/252 This Creative Component is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Creative Components by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. A Computationally Efficient Method for Selecting a Split Questionnaire Design Matthew Stuart1, Cindy Yu1,∗ Department of Statistics Iowa State University Ames, IA 50011 Abstract Split questionnaire design (SQD) is a relatively new survey tool to reduce response burden and increase the quality of responses. Among a set of possible SQD choices, a design is considered as the best if it leads to the least amount of information loss quantified by the Kullback-Leibler divergence (KLD) distance. However, the calculation of the KLD distance requires computation of the distribution function for the observed data after integrating out all the missing variables in a particular SQD. For a typical survey questionnaire with a large number of categorical variables, this computation can become practically infeasible. Motivated by the Horvitz-Thompson estima- tor, we propose an approach to approximate the distribution function of the observed in much reduced computation time and lose little valuable information when comparing different choices of SQDs.
    [Show full text]
  • Describing Data: the Big Picture Descriptive Statistics Community
    The Big Picture Describing Data: Categorical and Quantitative Variables Population Sampling Sample Statistical Inference Exploratory Data Analysis Descriptive Statistics Community Coalitions (n = 175) In order to make sense of data, we need ways to summarize and visualize it. Summarizing and visualizing variables and relationships between two variables is often known as exploratory data analysis (also known as descriptive statistics). The type of summary statistics and visualization methods to use depends on the type of variables being analyzed (i.e., categorical or quantitative). One Categorical Variable Frequency Table “What is your race/ethnicity?” A frequency table shows the number of cases that fall into each category: White Black “What is your race/ethnicity?” Hispanic Asian Other White Black Hispanic Asian Other Total 111 29 29 2 4 175 Display the number or proportion of cases that fall into each category. 1 Proportion Proportion The sample proportion (̂) of directors in each category is White Black Hispanic Asian Other Total 111 29 29 2 4 175 number of cases in category pˆ The sample proportion of directors who are white is: total number of cases 111 ̂ .63 63% 175 Proportion and percent can be used interchangeably. Relative Frequency Table Bar Chart A relative frequency table shows the proportion of cases that In a bar chart, the height of the bar corresponds to the fall in each category. number of cases that fall into each category. 120 111 White Black Hispanic Asian Other 100 .63 .17 .17 .01 .02 80 60 40 All the numbers in a relative frequency table sum to 1.
    [Show full text]