Datasets, R Packages, and Internet Resources

Appendix A Datasets, R Packages, and Internet Resources Central Web Site and Datasets The web site for information related to this book is biostat.mc.vanderbilt. edu/rms, and a related web site for a full-semester course based on the book is http://biostat.mc.vanderbilt.edu/CourseBios330. The main site contains links to several other web sites and a link to the dataset repository that holds most of the datasets mentioned in the text for downloading. These a datasets are in fully annotated R save (.sav suffixes) files ;someofthese are also available in other formats. The datasets were selected because of the variety of types of response and predictor variables, sample size, and numbers of missing values. In R they may be read using the load function, load(url()) to read directly from the Web, or by using the Hmisc package’s getHdata function to do the same (as is done in code in the case studies). From the web site there are links to other useful dataset sources. Links to presentations and technical reports related to the text are also found on this site, as is information for instructors for obtaining quizzes and answer sheets, extra problems, and solutions to these and to many of the problems in the text. Details about short courses based on the text are also found there. The main site also has Chapter 7 from the first edition, which is a case study in ordinary least squares modeling. R Packages The rms package written by the author maintains detailed information about a model’s design matrix so that many analyses using the model fit are au- tomated. rms is a large package of R functions. Most of the functions in rms analyze model fits, validate them, or make presentation graphics from them, a By convention these should have had .rda suffixes. © Springer International Publishing Switzerland 2015 535 F.E. Harrell, Jr., Regression Modeling Strategies, Springer Series in Statistics, DOI 10.1007/978-3-319-19425-7 536 A Datasets, R Packages, and Internet Resources but the packages also contain special model–fitting functions for binary and ordinal logistic regression (optionally using penalized maximum likelihood), unpenalized ordinal regression with a variety of link functions, penalized and unpenalized least squares, and parametric and semiparametric survival models. In addition, rms handles quantile regression and longitudinal analysis using generalized least squares. The rms package pays special attention to computing predicted values in that design matrix attributes (e.g., knots for splines, categories for categorical predictors) are “remembered” so that predictors are properly transformed while predictions are being generated. The functions makes extensive use of a wealth of survival analysis software written by Terry Therneau of the Mayo Foundation. This survival package is a standard part of R. The author’s Hmisc package contains other miscellaneous functions used in the text. These are functions that do not operate on model fits that used the enhanced design attributes stored by the rms package. Functions in Hmisc include facilities for data reduction, imputation, power and sample size calcu- lation, advanced table making, recoding variables, translating SAS datasets into R data frames while preserving all data attributes (including variable and value labels and special missing values), drawing and annotating plots, 371 and converting certain R objects to LATEX typeset form. The latter capa- bility, provided by a family of latex functions, completes the conversion to LATEX of many of the objects created by rms. The packages contain several LATEX methods that create LATEX code for typesetting model fits in algebraic notation, for printing ANOVA and regression effect (e.g., odds ratio) tables, and other applications. The LATEX methods were used extensively in the text, especially for writing restricted cubic spline function fits in simplest notation. The latest version of the rms package is available from CRAN (see below). It is necessary to install the Hmisc package in order to use rms package. The Web site also contains more in-depth overviews of the packages, which run on UNIX, Linux, Mac, and Microsoft Windows systems. The packages may be automatically downloaded and installed using R’s install.packages function or using menus under R graphical user interfaces. R-help, CRAN, and Discussion Boards To subscribe to the highly informative and helpful R-help e-mail group, see the Web site. R-help is appropriate for asking general questions about R including those about finding or writing functions to do specific analyses (for questions specific to a package, contact the author of that package). Another resource is the CRAN repository at www.r-project.org. Another excellent resource for askings questions about R is stackoverflow.com/questions/tagged/r. There is a Google group regmod devoted to the book and courses. A Datasets, R Packages, and Internet Resources 537 Multiple Imputation The Impute E-mail list maintained by Juned Siddique of Northwestern Univer- sity is an invaluable source of information regarding missing data problems. To subscribe to this list, see the Web site. Other excellent sources of online information are Joseph Schafer’s “Multiple Imputation Frequently Asked Questions” site and Stef van Buuren and Karin Oudshoorn’s “Multiple Im- putation Online” site, for which links exist on the main Web site. Bibliography An extensive annotated bibliography containing all the references in this text as well as other references concerning predictive methods, survival analysis, logistic regression, prognosis, diagnosis, modeling strategies, model valida- tion, practical Bayesian methods, clinical trials, graphical methods, papers for teaching statistical methods, the bootstrap, and many other areas may be found at http://www.citeulike.org/user/harrelfe. SAS SAS macros for fitting restricted cubic splines and for other basic operations are freely available from the main Web site. The Web site also has notes on SAS usage for some of the methods presented in the text. References Numbers following are page numbers of citations. 1. O. O. Aalen. Nonparametric inference in connection with multiple decrement models. Scan J Stat, 3:15–27, 1976. 413 2. O. O. Aalen. Further results on the non-parametric linear regression model in survival analysis. Stat Med, 12:1569–1588, 1993. 518 3. O. O. Aalen, E. Bjertness, and T. Sønju. Analysis of dependent survival data applied to lifetimes of amalgam fillings. Stat Med, 14:1819–1829, 1995. 421 4. M. Abrahamowicz, T. MacKenzie, and J. M. Esdaile. Time-dependent haz- ard ratio: Modeling and hypothesis testing with applications in lupus nephritis. JAMA, 91:1432–1439, 1996. 501 5. A. Agresti. A survey of models for repeated ordered categorical response data. Stat Med, 8:1209–1224, 1989. 324 6. A. Agresti. Categorical data analysis. Wiley, Hoboken, NJ, second edition, 2002. 271 7. H. Ahn and W. Loh. Tree-structured proportional hazards regression modeling. Biometrics, 50:471–485, 1994. 41, 178 8. J. Aitchison and S. D. Silvey. The generalization of probit analysis to the case of multiple responses. Biometrika, 44:131–140, 1957. 324 9. K. Akazawa, T. Nakamura, and Y. Palesch. Power of logrank test and Cox regression model in clinical trials with heterogeneous samples. Stat Med, 16:583– 597, 1997. 4 10. O. O. Al-Radi, F. E. Harrell, C. A. Caldarone, B. W. McCrindle, J. P. Jacobs, M. G. Williams, G. S. Van Arsdell, and W. G. Williams. Case complexity scores in congenital heart surgery: A comparative study of the Aristotal Basic Complexity score and the Risk Adjustment in Congenital Heart Surg (RACHS- 1) system. J Thorac Cardiovasc Surg, 133:865–874, 2007. 215 11. J. M. Alho. On the computation of likelihood ratio and score test based con- fidence intervals in generalized linear models. Stat Med, 11:923–930, 1992. 214 12. P. D. Allison. Missing Data. Sage University Papers Series on Quantitative Applications in the Social Sciences, 07-136. Sage, Thousand Oaks CA, 2001. 49, 58 © Springer International Publishing Switzerland 2015 539 F.E. Harrell, Jr., Regression Modeling Strategies, Springer Series in Statistics, DOI 10.1007/978-3-319-19425-7 540 References 13. D. G. Altman. Categorising continuous covariates (letter to the editor). Brit J Cancer, 64:975, 1991. 11, 19 14. D. G. Altman. Suboptimal analysis using ‘optimal’ cutpoints. Brit J Cancer, 78:556–557, 1998. 19 15. D. G. Altman and P. K. Andersen. A note on the uncertainty of a survival probability estimated from Cox’s regression model. Biometrika, 73:722–724, 1986. 11, 517 16. D. G. Altman and P. K. Andersen. Bootstrap investigation of the stability of a Cox regression model. Stat Med, 8:771–783, 1989. 68, 70, 341 17. D. G. Altman, B. Lausen, W. Sauerbrei, and M. Schumacher. Dangers of using ‘optimal’ cutpoints in the evaluation of prognostic factors. J Nat Cancer Inst, 86:829–835, 1994. 11, 19, 20 18. D. G. Altman and P. Royston. What do we mean by validating a prognostic model? Stat Med, 19:453–473, 2000. 6, 122, 519 19. B. Altschuler. Theory for the measurement of competing risks in animal exper- iments. Math Biosci, 6:1–11, 1970. 413 20. C. F. Alzola and F. E. Harrell. An Introduction to S and the Hmisc and Design Libraries, 2006. Electronic book, 310 pages. 129 21. G. Ambler, A. R. Brady, and P. Royston. Simplifying a prognostic model: a simulation study based on clinical data. Stat Med, 21(24):3803–3822, Dec. 2002. 121 22. F. Ambrogi, E. Biganzoli, and P. Boracchi. Estimates of clinically useful mea- sures in competing risks survival analysis. Stat Med, 27:6407–6425, 2008. 421 23. P. K. Andersen and R. D. Gill. Cox’s regression model for counting processes: A large sample study. Ann Stat, 10:1100–1120, 1982.

Load more