Appendix: Contingency Table Analysis in Practice

Appendix A Appendix: Contingency Table Analysis in Practice A.1 Software for Categorical Data Analysis The free software R, for statistical computing and graphics, is of increasing popularity and usage (R web site: http://www.r-project.org/). Many researchers support their published papers with the related R code. This way, R software is continuously updated and one can find a variety of functions for basic or advanced analysis of categorical data and special types of them. R language and environment is similar to S and code written for S-Plus runs usually under R as well. Furthermore, standard statistical packages, such as SAS, SPSS, and Stata, are well supplied to treat categorical data. Especially in their updated versions, their features concerning categorical data analysis are enriched. They incorporate procedures for applying the recently developed methods and models in categorical data analysis following the new computing strategies. Briefly, one could say that their major new features concern mainly options for exact analysis and analysis of repeated categorical data. Thus, NLMIXED of SAS fits generalized linear mixed models while GEE analysis for marginal models can be performed in GENMOD. SPSS offers the “generalized estimating equations” sub-option under the “GLMs” option. The related R function is gee(). For categorical data analysis with SAS, we refer to Stokes et al. (2012) while a variety of SAS codes are presented and discussed in the Appendix of Agresti (2007, 2013). Advanced models are fitted in R using special functions, developed individually, and included in different libraries. Orientated toward categorical data analysis and models for ordinal data as well are the libraries MASS (Venables and Replay) and VGLM, VGAM developed by Yee (2008). For example, generalized linear mixed models can be fitted through the glmmPQL() function of the MASS library. Other software, as BMDP, Minitab, and SYSTAT, have also components for categorical data inference. M. Kateri, Contingency Table Analysis: Methods and Implementation Using R, 271 Statistics for Industry and Technology, DOI 10.1007/978-0-8176-4811-4, © Springer Science+Business Media New York 2014 272 A Appendix: Contingency Table Analysis in Practice Bayesian analysis of categorical data can be carried out through WINBUGS (http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml), which is a free software. Another option is to perform categorical data analysis through MATLAB, as Johnson and Albert (2000). The MATLAB functions they used are described in their Appendix. For categorical data analysis, there have been developed also some special packages. Thus, exact analysis of categorical data is performed by StatXact while exact conditional logistic regression can be fitted by LogXact. SUDAAN is specialized for analysis of mixed data from stratified multistage cluster designs. It has also the feature of analyzing marginal models for nominal and ordinal responses by GEE. Software tool for estimating marginal regression models is also MAREG. Finally, some algorithms may be found in Fortran. For example, Haberman (1995) provided a Fortran program for fitting the association model RC(K)by the Newton–Raphson method while Ait-Sidi-Allal et al. (2004) implemented their algorithms for estimating parameters in association and correlation models also in Fortran. A.2 Contingency Table Analysis with R All procedures and models discussed in this book are worked out in R,inafashion aiming that even readers not familiar with R will be able to apply in practice all the models discussed here, even the nontrivial ones, fast and directly. A web companion of the book serves this goal. This section of the Appendix is basically the content description of the web companion of the book, to be found under http://cta.isw.rwth-aachen.de A.2.1 R Packages for Contingency Table Analysis An extensive list of special R packages, useful in the analysis of contingency tables, is provided in the web appendix. A.2.2 Data Input in R Alternative forms of defining contingency tables data in R are presented (matrix(), array(),anddata.frame()) and transformations from one type to another are illustrated. Ways of entering or reading data are discussed. A.3 R Functions Used 273 A.3 R Functions Used The R functions constructed for the descriptive and inferential needs of this book are given in the corresponding section of the web appendix, organized by chapter of their first use. A.3.1 R Functions of Chap.1 • Binomial–Normal Distribution Graph: bin_norm( ) A.3.2 R Functions of Chap.2 • Likelihood Ratio Statistic for Testing Independence in Two-way Contingency Tables: G2( ) • Odds Ratio for a 2 × 2Table:odds.ratio( ) • Local Odds Ratios for an I × J Table: local.odds.DM( ) • Global Odds Ratios for an I × J Table: global.odds.DM( ) • Cumulative Odds Ratios for an I × J Table: cum.odds.DM( ) • Continuation Odds Ratios for an I × J Table: cont.odds.DM( ) • Linear Trend Test: linear.trend( ) • Midrank Scores Computation: midrank( ) • Fourfold Plots for the Local Odds Ratios of an I × J Table: ffold.local( ) A.3.3 R Functions of Chap.3 • Breslow–Day–Tarone Test of Homogeneous Association: BDT( ) • Woolf’s Test of Homogeneous Association: woolf( ) A.3.4 R Functions of Chap.5 • Independence (I) Model for Two-way Contingency Tables: fit.I( ) • Quasi-Independence (QI) Model for Two-way Contingency Tables: fit.QI( ) 274 A Appendix: Contingency Table Analysis in Practice A.3.5 R Functions of Chap.6 • Scores’ Rescaling to Obey the Weighted Constraints (6.17): rescale( ) • Uniform (U) Association Model: fit.U( ) • Row Effect (R) Association Model: fit.R( ) • Column Effect (C) Association Model: fit.C( ) • Row–Column (RC) Association Model: fit.RC( ) • RC(M) Association Model: fit.RCm( ) • Plotting the Row and Column Scores in Two Dimensions: plot_2dim( ) A.3.6 R Functions of Chap.9 • (1 − a)100% Asymptotic Confidence Interval for the Difference of Correlated Proportions: McNemar.CI( ) • Factors Needed to Fit Symmetry Models on an I × I Table in glm: SYMV( ) • Scores’ Rescaling to Satisfy Constraints (9.38): rescale.square( ) A.4 Contingency Table Analysis with SPSS The association and symmetry models cannot be fitted directly in SPSS through the options of the windows commands. Association models that are GLM can be fitted through the GLM option by defining the appropriate vectors, as explained in Sect. 6.6. For all two-way association models (RC(M) included, which is nonlinear and thus cannot be fitted in GLM) and the symmetry models, we provide appropriate syntax codes to be fitted in SPSS MATRIX. In particular, we provide MATRIX codes for: • Independence for two-way tables using SPSS MATRIX • Association models for two-way tables (uniform (U), row effect (R), column effect (C), and RC (M) association models) • Symmetry models References Agresti, A.: Generalized odds ratios for ordinal data. Biometrics 36, 59–67 (1980) Agresti, A.: A survey of strategies for modeling cross–classifications having ordinal variables. J. Am. Stat. Assoc. 78, 184–198 (1983a) Agresti, A.: A simple diagonals–parameters symmetry and quasi-symmetry model. Stat. Probab. Lett. 1, 313–316 (1983b) Agresti, A.: Testing marginal homogeneity for ordinal categorical variables. Biometrics 39, 505–510 (1983c) Agresti, A.: Applying R2-type measures to ordered categorical data. Technometrics 28, 133–138 (1986) Agresti, A.: A model for agreement between ratings on an ordinal scale. Biometrics 44, 539–548 (1988) Agresti, A.: A survey of exact inference for contingency tables (with discussion). Stat. Sci. 7, 131–177 (1992) Agresti, A.: Computing conditional maximum likelihood estimates for generalized Rasch models using simple loglinear models with diagonal parameters. Scand. J. Stat. 20, 63–71 (1993) Agresti, A.: On logit confidence intervals for the odds ratio with small samples. Biometrics 55, 597–602 (1999) Agresti, A.: Exact inference for categorical data: recent advances and continuing controversies. Stat. Med. 20, 2709–2722 (2001) Agresti, A.: Dealing with discreteness: making ‘exact’ confidence intervals for proportions, differences of proportions, and odds ratios more exact. Stat. Meth. Med. Res. 12, 3–21 (2003) Agresti, A:. An Introduction to Categorical Data Analysis, 2nd edn. Wiley, New York (2007) Agresti, A.: Analysis of Ordinal Categorical Data, 2nd edn. Wiley, New York (2010) Agresti, A.: Categorical Data Analysis, 3rd edn. Wiley, Hoboken (2013) Agresti, A., Chuang, C.: Model-based Bayesian methods for estimating cell proportions in cross- classification tables having ordered categories. Comput. Stat. Data Anal. 7, 245–258 (1989) Agresti, A., Coull, B.A.: Approximate is better than ‘exact’ for interval estimation of binomial proportions. Am. Stat. 52, 119–126 (1998) Agresti, A., Coull, B.A.: The analysis of contingency tables under inequality constraints. J. Stat. Plann. Infer. 107, 45–73 (2002) Agresti, A., Hartzel, J.: Strategies for comparing treatments on a binary response with multi-centre data. Stat. Med. 19, 1115–1139 (2000) Agresti, A., Hitchcock, D.B.: Bayesian inference for categorical data analysis. Stat. Methods Appl. 14, 297–330 (2005) Agresti, A., Kateri, M.: Some remarks on latent variable models in categorical data analysis. Commun. Stat. Theory Meth. 43, 1–14 (2014) M. Kateri, Contingency Table Analysis: Methods and Implementation Using R, 275 Statistics for Industry and Technology, DOI 10.1007/978-0-8176-4811-4, © Springer Science+Business Media New York 2014 276 References Agresti, A., Kezouh, A.: Association models for multidimensional cross-classifications of ordinal variables. Commun. Stat. Ser. A 12, 1261–1276 (1983) Agresti, A., Lang, J.B.: Quasi-symmetric latent class models, with application to rater agreement. Biometrics 49, 131–139 (1993) Agresti, A., Min, Y.: On small-sample confidence intervals for parameters in discrete distributions. Biometrics 57, 963–971 (2001) Agresti, A., Min, Y.: Unconditional small-sample confidence intervals for the odds ratio. Biostatis- tics 3, 379–386 (2002) Agresti, A., Yang, M.: An empirical investigation of some effects of sparseness in contingency tables.

Load more